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THE STATUS OF DIAGNOSTIC PSYCHOLOGICAL 
TESTING’ 


By DAVID RAPAPORT 


THE MENNINGER FOUNDATION 


E ARE gathered here for a 
round table on diagnostic psycho- 
logical testing at a time when the quan- 
tity of demand for such testing exceeds 
what can be supplied by institutions 
training diagnostic testers and when the 
quality of demand asks too much of our 
present-day knowledge about it. It befits 
us therefore to recognize the seriousness 
of the situation: literally hundreds of 
clinical psychologists will soon enter the 
field of diagnostic testing and will face 
problems of diagnostic work for which 
our present-day systematic knowledge 
has no answer and which can be answer- 
ed only by the experienced practitioner 
by virtue of his case-by-case experience. 
It befits us I believe to gather at such a 
round table in order to consider the 
errors, inadequacies and tasks in the de- 
velopment of systematic diagnostic psy- 
chological testing. 
Concerning the errors in the develop- 


1 Opening address at the Round Table on 
Diagnostic Testing, American Psychological 
——. Philadelphia, Pa., September 6, 
1946. 


ment of diagnostic testing I believe we 
must admit that there has been a funda- 
mental weakness in establishing consis- 
tent criteria for our clinical groups. 
These criteria are crucial for establish- 
ing a diagnostic frame of reference and 
for further psychological exploration of 
clinical cases by means of tests. Though 
it could be said that our weakness as to 
the criteria of psychiatric conditions is 
referable to the weakness of psychiatric 
nosology, such a defense appears to me 
to be an indefensible error. We should 
not altogether naively accept current 
psychiatric nosology; psychiatrists 
themselves are launching a vigorous at- 
tack on their nosology. Accordingly, we 
should not pull out, for example, the rec- 
ords of 100 state hospital cases, take 
their diagnosis at face value, and pro- 
ceed to establish statistically the diag- 
nostic test indications for this question- 
able sample. Our job is to contribute by 
our explorations to nosological clarifica- 
tion and if necessary to the crystalliza- 
tion of new nosological entities. This we 
will achieve by cooperating with those 








2 JOURNAL OF CONSULTING PSYCHOLOGY 


teachers, pediatricians, neurologists, 
psychiatrists, etc., who have an under- 
standing of the nature of our tests. It is 
true to be sure that if we do this the 
statisticians will attack us for “loading 
the dice” by validating our diagnostic 
tools against criterion groups segregat- 
ed to suit the potentialities of the tool. 
Yet at the level of clarity of present nos- 
ology it would be psychologically just as 
meaningless to accept without question 
the diagnosis of a hundred random state 
hospital cases and process their tests 
statistically as it would be to elaborate 
a nosology solely on the basis of tests 
and test conformity. 

Concerning our inadequacies I believe 
that the most important one to empha- 
size is our inadequacy of test theory or 
rationale. For many years we have 
searched for test indicators which would 
differentiate various aptitudes, poten- 
tialities, or pathological conditions; we 
have not, however, consistently sought 
the psychological explanations of why 
this or that indicator serves its purpose. 
As often, however, as a set of conditions 
eliminated the “standard” indicator of 
a certain potentiality from our tests, we 
were rendered helpless in obtaining test 
information about that potentiality: we 
had no rationale to tell us why usually 
just that indicator referred to the po- 
tentiality in question; we had no ration- 
ale to tell us what conditions replace one 
usual indicator with another less usual 
but equally valid indicator; we had no 
rationale to tell us where to look for 
these other indicators. /ndicators in tests 
are products of thought organization of 
the subject; discovery of the psycho- 
logical “why” of these indicators and 
development of a systematic rationale 
of them appear to me to be the only 
ways out of this situation. We cannot 
afford to remain diagnostically helpless 
in the absence of mechanically differen- 
tiating indicators. Therefore, I submit 


that our most important deficiency is in 
our rationale, in our. psychological un- 
derstanding of the processes where end- 
results are our diagnostic indicators. 

Concerning our tasks: The human 
thought process has a tremendous varie- 
ty of expressions in human behavior. 
There is no reason why any and every 
one of these could not be used—if sys- 
tematically explored—to indicate char- 
acteristics of the thought process and 
thereby allow inferences concerning the 
presence of potentialities, aptitudes, and 
kinds of existing adjustment and malad- 
justment. Therefore, our outstanding 
task is the exploration of a wide range 
of behavioral expressions of our thought 
processes to establish which ones could 
be used to greatest advantage in diag- 
nostic psychological testing. 

Another task is implicit in this one, 
though its solution may be far off. We 
have learned that without a battery of 
tests our errors tend to increase and 
our discriminations remain crude. Again 
and again we have been warned that 
instead of using a rigid battery of tests, 
we should use our tests flexibly, suiting 
the battery to the individual problem. 
Yet it is at least worth considering the 
possibility that once we have explored 
a wide range of behavioral expressions 
of thought organization, we will be in a 
position to develop a more or less stan- 
dard battery of tests which will sample 
these expressions adequately and conse- 
quently facilitate the diagnosis of poten- 
tialities, aptitudes, kinds of adjustments 
and maladjustments. Once these advanc- 
es are made, flexibility in the choice of 
tests would reduce to choosing an al- 
ternative test to sample an area of 
functioning, whenever the specific char- 
acteristics of a case limit the effective- 
ness of a test routinely used in this con- 
nection. Such a systematic procedure is 
far from what can be followed today. 

In several of the papers that follow, 
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you will find emphasis placed upon the 
difficulty of discriminating psychiatric 
clinical syndromes and various types of 
adjustments which must be considered 
to be within the “normal range’’. I be- 
lieve that this difficulty warrants serious 
consideration. While this fact repre- 
sents for the science of psychology the 
continuity of normality and psychiatric 
disorder, and while it lends new support 
to the contention that premorbid per- 
sonality structure forecasts the type of 
maladjustment that may come about un- 
der stress, diagnostically and sociologi- 
cally it puts us on the spot. Diagnostical- 
ly it demands that we learn much more 
about premorbid personality and the 
forces that keep maladjustment tenden- 
cies in check, and much more about the 
criteria for existing maladjustment. So- 
ciologically the diagnostic tester who 


steps out of the psychiatric setting for 
purposes other than vocational advise- 
ment and similar pursuits is in a vulner- 
able position. The ubiquity of signs of 
potential maladjustment in normal pop- 
ulations may easily be an expression of 
the exacerbation of personal difficulties 
in our present extremely fluid and to a 
great extent planless society. These 
signs of potential maladjustment were 
kept in check, in the past, by a more 
fixed social framework of existence. 
The temptation to apply the label “neu- 
rosis” to personal problems referable to 
social conditions is an error which may 
bring the clinical psychologist danger- 
ously close to the ranks of those practi- 
tioners whom we are accustomed to 
label quacks, and whom Lee Steiner 
described so vividly in her recent book, 
“Where Do People Take Their Troubles’”’. 








ON THE OBJECTIVE AND SUBJECTIVE ASPECTS OF 
DIAGNOSTIC TESTING’ 


By ROY SCHAFER 


THE MENNINGER CLINIC 


HROUGHOUT the history of diag- 
i nostic psychological testing, the 
cry “subjective” has been raised against 
it. It has been argued, in essence, that 
data which are not subject to quantita- 
tive expression and to existing statisti- 
cal tests of significance are not subject 
matter of a scientific psychology; and, 
inasmuch as the clinician necessarily 
draws upon nonquantitative aspects of 
his test data and reasons in nonquanti- 
tative terms about all his data, he is 
censured for his disregard of scientific 
method. Scientific method, however, has 
been redefined by modern science in or- 
ganismic or field-theoretical terms. It 
is the task of today’s clinical psycholo- 
gist to meet the challenge of this redef- 
inition by reexamining his own concepts 
and procedures in the light of this new 
viewpoint. This task is all the more ur- 
gent because of the currently greatly in- 
creased demand for the assistance of 
psychological testing in many fields of 
psychological work. 


A cardinal assumption in dynamic 
psychological thinking is the relativity 
of significance of any datum. The clini- 
cian well knows that no test response 
or performance has a significance in and 
of itself, but only with respect to the 
context of responses in which it occurs. 
Empirically he has been driven to rec- 
ognize and accept what modern theoreti- 
cal psychology would demand: a relati- 
vistic interpretation of the events he is 


1 Paper read at the Round Table on Diag- 
nostic Testing, American Psychological Asso- 
ciation, Philadelphia, Pa., September 6, 1946. 
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studying. We cannot know the signifi- 
cance of any instance of behavior until 
we know the constellation of psychologi- 
cal conditions from which this behavior 
issued and the context in which it ap- 
pears. In diagnostic testing, for exam- 
ple, the meaning of an absolutely low 
Comprehension score on the Wechsler- 
Bellevue Scale [6] can only be deter- 
mined by the level of the remaining 
scores, the qualitative features of ver- 
balization and the indications provided 
by all other tests administered. This is 
so because only by taking reference to 
every aspect of the test data can we con- 
struct a unitary picture of the subject 
in which poor comprehension becomes 
a necessary derivative characteristic. 


The process whereby any test datum 
takes on its particular significance for 
an examiner in a given case is the intel- 
lectual process of insight. It is with test 
data as with the materials lying in the 
cage of Kohler’s apes; through a pro- 
cess of insight whereby a structuriza- 
tion of the field of observation takes 
place and with it an assignation of de- 
rived meanings to all the parts of the 
field, there results an internally coher- 
ent, psychologically consistent course of 
behavior. Psychologists may at least 
lay claim to the privilege of insight ac- 
corded to Kohler’s apes [2]. In the psy- 
chologist’s insight diagnostic relation- 
ships inhering in the test results come 
to the fore and an internally consistent 
psychological picture of the patient is 
elaborated. 

The necessity for proceeding by in- 


OBJECTIVE AND SUBJECTIVE ASPECTS OF DIAGNOSTIC TESTING 5 


sight in diagnostic testing springs from 
several particular problems: 1. Many 
aspects of the organism’s behavior ap- 
pear to be intrinsically not quantifiable. 
How can we, for example, quantify 
without artifice a mode of coping with 
passive needs? Or a mode of verbaliz- 
ing one’s thoughts? Or an egocentric 
mode of concept formation? Yet these 
are characteristics which are often in- 
dicated by present tests, in these exam- 
ples by Murray’s Thematic Appercep- 
tion Test [3] and the Weigl Sorting 
Test [7] particularly. In essence, how 
can we hope to quantify a total person- 
ality? 2. In working with a battery of 
tests, how can we quantify without arti- 
fice the contributions each test makes 
to the understanding of the results of 
the others? The presence of many indi- 
cations of free-floating anxiety on the 
Rorschach Test [4] often indicates that 
impairment of certain performance 
achievements on the Wechsler-Bellevue 
Scale [6] is referable to the effects of 
strong feelings of tension rather than 
depression or organic pathology. 3. In 
working with any particular test, how 
can we hope to quantify the process 
whereby a response comes about? And 
yet, as Scheerer correctly has pointed 
out, it is this process and not the end 
result scored plus or minus, 3 or 6, 
which constitutes our basic material for 
interpretation [5]. 4. How can we 
quantify that process of interpretation 
whereby a total pattern of subtest 
scores on the Wechsler-Bellevue Scale 
never before seen by the examiner im- 
mediately and correctly indicates a di- 
agnosis such as schizophrenia? Where 
in the equation could we fit the conno- 
tations of past experience aroused by 
various aspects of the present “unique” 
pattern? 

These four types of obstacles to quan- 
tification are all realities, facts of diag- 
nostic experience. They are facts which 


at once refute the identification of the 
“objective” with the “quantifiable”. 
Klein has demonstrated that the multi- 
ple regression technique of finding diag- 
nostic patterns of subtest scores on the 
Wechsler-Bellevue Scale is able to dis- 
tinguish significantly a group of schizo- 
phrenics from a group of normal con- 
trols [1], but by this technique it is not 
demonstrable that in the absence of 
every aspect of this statistically-estab- 
lished diagnostic pattern, a configura- 
tion of scores can still indicate schizo- 
phrenia. And this indication will exist 
only in the getting of insightful think- 
ing. 

Because of these limitations of the 


‘diagnostic usefulness of statistical find- 


ings, it becomes clear that one neces- 
sary condition for diagnostic testing is a 
dynamically-oriented rationale of the 
functions or processes underlying the 
various test performances. This ration- 
ale must be consistent with what is 
known concerning the development of 
ego processes in the field comprising the 
biological organism and its social sur- 
roundings. This rationale must recognize 
that the different components of what 
is called “intelligence” are merely points 
of view from which it is profitable to 
consider the activities of the organism. 
Intelligence functions are not to be con- 
sidered static one-dimensional traits. 
The impairment of any of them may be 
referable to several different patho!ogi- 
cal conditions, each condition impairing 
a different aspect of the performance 
but all the conditions lowering the ob- 
tained score. Block Designs, in the 
Wechsler-Bellevue Scale, for example, 
may be impaired by (a) organic path- 
ology disrupting an ordinarily “ab- 
stract” conceptual approach to the prob- 
lem; (b) a state of extreme tension in 
which blocking and a concentration im- 
pairment greatly reduce visual-motor 
coordination; (c) a severe depression 
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retarding the speed of visual organiz- 
ing and motor performance; and so 
forth. The overlapping of functions im- 
plicit in this example is a complicating 
factor which can be simplified only by 
extending our rationale of all the per- 
formances elicited by our tests. Ration- 
ale must occupy a figural position in the 
process of diagnostic insight; it allows 
the diagnoStician that flexibility of un- 
derstanding whereby he can transcend 
mechanical quantitative thinking and 
cope with the new and unique. 

What then is the role of statistical 
investigation in the theory and practice 
of diagnostic testing? Statistics can 
and do point to relationships among 
quantifiable aspects of test data which 
recur with significant frequency with 
respect to some criterion such as a par- 
ticular form of neurosis. In this they 
render an invaluable service by pointing 
to the need for psychological hypothe- 
ses or rationale at just those places 
where relationships are discovered. In 
themselves, statistics do not supply 
these hypotheses. A correlation coeffici- 
ent for example cannot establish a diag- 
nosis, as Klein has shown [1]. Statis- 
tics, in a dynamic psychology, therefore 
are not used to establish the chance 
probability of an interpretation or diag- 
nosis, but rather lead to and subsequent- 
ly verify hypotheses which can establish 
meaningfully probable diagnostic con- 
clusions. Statistics serve also to put 
into communicable and therefore teach- 
able form the meaningful or determined 
relationships uncovered by repeated pro- 
cesses of insight. Thus statistical find- 
ings are integrally involved in rationale 
as described above in that they indicate 
the usual forms and locations of the 
diagnostic relationships explained by 
that rationale. Accordingly thorough 
familiarity with statistically established 
trends will also occupy a central or fig- 
ural position in the process of diagnos- 


tic insight. 

Two additional figural conditions in 
the process of insight must be men- 
tioned, besides the already mentioned 
conditions of thorough familiarity with 
the results of statistical investigations 
and the meaning given these by a ra- 
tionale of the functions underlying vari- 
ous test performances. The third neces- 
sary condition is a thorough grounding 
in psychodynamic concepts, especially 
in those issuing from the psychoanalytic 
and orgcnismic schools of thought. Thus 
far in the history of diagnostic testing 
the most fruitful hypotheses and con- 
sequent investigations have been stimu- 
lated by these schools of thought. With 
the concepts of these schools as a back- 
ground, diagnostic insights which are 
unitary and transposible in character 
become possible, while juxtaposition of 
psychologically inconsistent fragments 
of “interpretation” tends to reduce to a 
minimum. With respect to the fourth 
and final condition for insight, a plea 
must be made that psychologists recog- 
nize and teach that the richest source 
book of psychological understanding is 
carried by the individual within him- 
self; that through a process of self-ex- 
amination, of reopening of one’s experi- 
ences with other people and examining 
the intricately intertwined affects, ideas, 
needs, that go into these, the individual 
will bring to the diagnostic process a 
greatly heightened sensitivity to the 
logic peculiar to psychodynamics. The 
development of this sensitivity is thus 
an objective condition prerequisite for 
insightful thinking. 

We must now raise the question 
whether and to what extent this process 
of insight is to be considered subjective. 
That the diagnostic process will occur 
“within” the clinician is of course obvi- 
ous; in this sense it will be subjective; 
it will not be worked out on a comptom- 
eter. However, that the diagnostic pro- 
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cess is neither verifiable nor communi- 
cable—and in this sense not objective— 
is an entirely different proposition and 
one which we must reject. Objectivity 
is not established by popular agreement 
or by purely quantitative thinking but 
only be agreement among competent ob- 
servers. In this word competent lies the 
necessary condition for establishing the 
validity of test findings. Verifiability of 
diagnostic conclusions or predictions is 
possible where competent analysis of 
case history and current status has been 
carried out separately from diagnostic 
analysis and where the results of these 
two processes of insight are checked 
against each other. Communicability is 
a more difficult criterion of objectivity 
to meet. Recognizing as we do that 
every case represents an essentially 
unique dynamic picture, we cannot jus- 
tifiably attempt to teach mechanical 
rules of interpretation. But we should 
attempt to introduce into the thinking 
of the apt student a number of figural 
considerations—in other words, a frame 
of reference—which will limit and speci- 
fy the directions available to the pro- 
cess of diagnostic insight. This, frame 
of reference will include basic psycho- 
dynamic theory, test rationale, familiar- 
ity with previous findings and Self- 
knowledge. This is the basis on which 
agreement among competent diagnosti- 
cians or in other words objectivity can 
be established. 

The arguments presented in this pa- 


per do not imply that all the conditions 
for diagnostic insight are already estab- 
lished. Far from it. Many of our so- 
called “insights” and rationales are stil! 
uncertain and incomplete. The research 
tasks lying ahead of clinical psycholo- 
gists are great in number and scope. 
Their emphasis will have to be chiefly 
upon exploring new aspects of the or- 
ganization of personality and thought 
processes, as well as upon new ap- 
proaches to old aspects of these. To a 
large extent then this paper has de- 
scribed an ideal in diagnostic testing 
which we are only slowly approximat- 
ing. 
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THE USE OF PROJECTIVE METHODS IN 
GROUP TESTING’ 


By RUTH L. MUNROE 


SARAH LAWRENCE COLLEGE 


HE PROBLEM I would like to dis- 
- cuss is the feasibility of using the 
projective approach in those normal 
situations where time is at such a pre- 
mium that the only possible answer is 
the group test. In education, in indus- 
try, in a host of ordinary life situations 
and in research the personality factor 
is of acknowledged importance, but it 
is impossible to study large groups of 
ordinary people with even a limited 
clinical battery of tests. They must be 
tested, if at all, by relatively quick 
group or self-administering methods. 
The test materials must be evaluated 
quickly, and the results made available 
in intelligible form. For some purposes, 
especially for research, the test proce- 
dures must be repeatable under condi- 
tions clearly described, and the final 
statement must be made in quantifiable 
or at least easily manipulable terms. 

The projective method as developed 
in the clinic meets none of these require- 
ments. It has so many advantages, how- 
ever, that it seems important to consider 
whether and how it can be modified for 
group use. In the following pages I 
shall rely to some extent upon empiri- 
cal data, but my aim is to discuss possi- 
bilities rather than to present finished 
procedures. I think available data are 
promising enough to suggest the value 
of careful analysis of the assets of the 
projective approach in group testing 
and open-minded consideration of the 


1 Paper read at the Round Table on Diag- 
nostic Testing, American Psychological Asso- 
ciation, Philadelphia, Pa., September 6, 1946. 


problems involved. The purpose of this 
paper is orientation to the special task 
of adaptation. 

Two aspects of the projective ap- 
proach seem to me basic. The first in- 
volves the presentation of complex but 
relatively unstructured materials with 
a minimum of instruction in order to 
obtain a complex specimen of sponta- 
neous action from the subject. The sec- 
ond involves acceptance of the unique- 
ness and complexity of this specimen in 
evaluation—a condition especially diffi- 
cult in group testing, which we will 
postpone for later examination. 


PROJECTIVE MATERIALS IN GROUP 
TESTING 


In regard to the first aspect, the pre- 
sentation of materials, it is easy enough 
to ask the subject to write out in a group 
what he sees in inkblots, the story sug- 
gested by pictures, answers to vague 
leading questions, associations to words; 
to have him draw or complete pictures, 
etc. If there is any merit in such complex 
spontaneous products as such, then the 
group products should have merit, even 
though the examiner has not been able 
either to make those adaptations in ad- 
ministration which encourage the sub- 
ject to fullest expression or to follow up 
special points for clarification and elab- 
oration. I think that most projective 
testers (clinicians) have been so dis- 
turbed by the inevitable loss entailed in 
group administration that they have not 
paid sufficient attention to the possible 
residual value. It is very hard for a 
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clinician accustomed to responsibility 
for judgment about an individual case 
to accept as legitimate compromise er- 
rors which he could easily avoid by ask- 
ing the subject what he meant. Theoreti- 
cally, however, the complex spontaneous 
product of the subject might be the main 
thing, and the loss sustained in group 
administration no greater than the low- 
er aspiration level of the group test 
would warrant. 

This theoretical expectation has by 
now fairly substantial empirical con- 
firmation. One way or another clinicians 
have been forced by practical or re- 
search requirements to be less exigent 
about methods of administration. Many 
instances of successful use of group ma- 
terials are reported in the literature and 
many more in conversation. The results 
obtained from the group Rorschach at 
Sarah Lawrence College to be described 
later will serve as example. Statistical 
studies and experience indicate that sub- 
jects see about the same things in about 
the same way when inkblots are shown 
on the screen as when they are held in 
the hand. Minor variations in the re- 
action of the subject, and fairly serious 
difficulties in scoring? operate to reduce 


2 By far the greatest amount of group pro- 
jective testing has been done with the Ror- 
schach blots. For reasons about to be consid- 
ered this seems to me desirable at present— 
it is the test we know most about. Actually 
it is not a good test for group administration 
because basic scoring depends too much on 
points the subject may not spontaneously re- 
port, for instance that it was the blackness of 
the blot which made him think of a bat. Es- 
tablishing these points requires special inquiry 
which can only be approximated in group ad- 
ministration. 

Stories, drawings and the like would seem 
to be more self-contained, to uire less spe- 
cial inquiry to determine essential data. The 
ingenuity of psychologists can surely devise 
better methods of eliciting a complex sponta- 
neous product more easily scored than our 
present adaptations of tests developed in the 
clinic, as soon as we understand more clearly 
just what is needed. 

It is for this reason that elaborate research 
on the correspondence of group and individual 
Rorschach protocols does not seem to me worth 


the accuracy of evaluation in the indi- 
vidual case. Some protocols must be dis- 
carded as unscorable. Some groups ac- 
cessible to individual testing are not 
suitable for group work. Under no cir- 
cumstances can the examiner rely upon 
the group protocol for sensitive diag- 
nosis with anything like the confidence 
he may feel for a carefully administered 
Rorschach. Nevertheless the satisfac- 
tory results obtained by applying famil- 
iar test principles to the analysis of the 
group products would seem to show 
clearly that they are essentially similar 
to the individual tests. 

The similarity of group test materials 
to those obtained in individual testing 
gives the group test adapted from cur- 
rently used individual methods an un- 
precedented advantage. It can start with 
an enormous background of information 
derived from experience with thousands 
of cases carefully observed in the clinic. 
The loss in accuracy and richness of the 
group Rorschach protocol as against in- 
dividual administration becomes trivial 
if one compares it with the difference 
between the questionnaire and the clini- 
cal interview. The questionnaire cannot 
be considered a relatively inaccurate 
version of the interview. Checking yes 
or no to printed questions involves a 
very different process from answering 
similar questions adaptively phrased 
and followed up in face-to-face encoun- 
ter, a fact too generally recognized to 
require further discussion here. Each 
questionnaire is really a new instrument 
which must be independently studied. 
Measures derived from factorial analy- 
sis are also new measures whose mean- 





while. I believe that the test is extremely use- 
ful at present, but that it will be superseded 
by group methods less intrinsically difficult. It 
seems more profitable to get ahead with basic 
information by means of a test involving an 
obvious margin of error than to make finicking 
efforts to uce an error smal! relative to the 
total present contribution and likely to be ir- 
relevant in later test developments. 
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ing must be empirically established. 
Tests based upon the measurement of 
simple perceptual or motor functions 
seem to me very promising — but again 
the experimenter starts from scratch in 
interpreting and validating his observa- 
tions. The same problem applies to the 
Multiple Choice Rorschach which de- 
pends upon recognition rather than 
spontaneous production of responses, 
and also to those new complex projective 
materials which I hope will ultimately 
supersede the Group Rorschach (see 
preceding footnote). 

An even more unique advantage in 
adapting current clinical tools to group 
testing is the introduction of a possible 
flexibility in evaluation. A questionnaire 
is standardized and validated as a whole, 
or with a few prescribed subscores. The 
moment the examiner departs from the 
official scores and uses the subject’s ans- 
wers to specific questions in any other 
way, he is strictly on his own. The good 
clinician often does so, doubtless suc- 
cessfully. In so doing, however, he relies 
on his own intuition and experience. 
Practically nothing is printed about the 
importance of a single question or a 
combination of questions as indicative 
of some refinement in diagnosis. In re- 
search the official scores are used, or the 
whole test is revamped as in Flanagan’s 
contribution to the Bernreuter. 

The most complete sceptic can agree 
that a Group Rorschach protocol yields 
data about as valid on one point as an- 
other.* Rorschach literature and teach- 


8 This statement is not strictly true. Sub- 
jects are more likely to describe action in a 
manner which makes an M scoring inevitable 
than to mention that the worms in card X are 
green—a point crucial for the FC score, but 
incidental for the subject. Inquiry may estab- 
lish that he had in mind “those green things”, 
but the spontaneous comment of the normal 
urbanite may easily omit this refined distinc- 
tion among worms as unimportant. 

Even with accurate scoring there is more 
agreement among Rorschach examiners on 
some aspects of performance than others. 


ing and experience give a substantial 
background to the almost infinite per- 
mutations and combinations observable 
in the test performance. The examiner 
aiming at individual evaluation or re- 
search is not restricted to the judgments 
indicated by the few scores previously 
established, but may ask: “What out- 
standing characteristics does this per- 
son show in his Rorschach perform- 
ance?”; or “What differences can be ob- 
served between these two groups?”. If 
he examines the test materials resource- 
fully, the psychologist may be able to 
point out facets of the subject’s person- 
ality especially important in the immedi- 
ate situation which are not covered by 
a few prescribed test dimensions. Or he 
may suggest trends common to groups 
of individuals which are obscured by 
pronounced individual differences in be- 
havior. These trends can at least serve 
as useful hypotheses for more focused 
study. 

We cannot here consider the special 
problems of repeatability and quantifi- 
ability involved in research on group 
trends. The points I have tried to stress 
are (a) that the projective method of- 
fers a complex specimen of spontaneous 
action even when administered to 
groups and (b) that where current in- 
dividual methods are adapted to group 
use, the group tester for the first time 
can approach the problem of evaluation 
with something of the resourcefulness 
and knowledge available to the clinician 
working with similar individual meth- 
ods. 


METHODS OF EVALUATION FOR THE 
INDIVIDUAL CASE 


The second aspect of the projective 
method—acceptance of the product as a 
unique and complex specimen—presents 
the most serious difficulty in fulfilling 
the requirements of group testing. The 
projective method has value essentially 
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because isolated observations are inter- 
preted in relation to the total picture. 
The Rorschach CF score, for instance, 
seems to represent affective lability in 
the subject. Relatively uninhibited use 
of color appears to have similar signifi- 
cance in spontaneous drawings with 
free choice of media. Yet to rate a sub- 
ject’s affective lability in terms of the 
number of CF scores or amount of color 
in drawing would be meaningless. Eval- 
uation of CF as a personality trait de- 
pends upon a dozen other factors in the 
subject’s performance. The final descrip- 
tion of a subject giving 3 CF may 
vary all the way from an aggressive 
bully to Caspar Milquetoast. Fragmen- 
tation of any projective method into the 
separate item scored or observed may 
yield quick and comfortably “objective” 
results. Unfortunately the results are 
very likely to be wrong, as has been 
repeatedly demonstrated where the 
scoring fragment is taken as the meas- 
ure of a particular personality trait. 

Undoubtedly results could be im- 
proved by a more insightful selection of 
separate items. Improved selection of 
repeatable items seems to me essential 
for some types of large-scale research. 
Furthermore the aggressive bully and 
Caspar Milquetoast may well have some 
basic trends in common which could be 
fruitfully examined in an investigation 
of dynamic patterns for scientific pur- 
poses. In evaluating the individual case, 
however, one must try to grasp how 
these trends interact with others to 
produce the unique functioning person- 
ality. The employer is more concerned 
with Caspar’s overtly timid behavior 
than with the repressed aggression he 
might show to his psychoanalyst. For a 
grasp of the interrelationships of com- 
plex data I think there can be no substi- 
tute for the wit of the trained psycholo- 
gist. 

It is worthwhile to point out that in- 


terpretation of the results of standard- 
ized tests in the individual case also de- 
pends upon the judgment of the psychol- 
ogist—except in so far as they are de- 
livered over to the lay administrator for 
judgment. Everyone knows that all of 
the standardized tests, even intelligence 
tests, may be simply erroneous in some 
cases and that the significance of an ac- 
curate score on a particular trait must 
be seen in relation to scores on other 
traits and background data. It seems 
proper to emphasize that the projective 
method attempts to apply within the 
confines of the test situation a process 
of judgment fully accepted in clinical 
psychology generally. The novelty seems 
really to lie more in the point at which 
judgment is introduced rather than in 
the introduction of judgment. 
Statistical checks on reliability and 
validity have tended, I think, toward a 
somewhat spurious security in the ob- 
jective nature of testing devices in 
practical evaluation. Reliability figures 
are never perfect, which means that 
some individuals may very well show 
entirely different results on retest. 
Which individuals may shift is not 
specified. Validation figures are uni- 
formly rather poor, even in the field of 
intelligence testing. A statistical state- 
ment of the margin of error found in a 
particular experimental group in a par- 
ticular situation is not a realistically ef- 
fective guide toward decision as to the 
validity of the score in an individual 
case in another situation. For instance, 
a study of Bernreuter scores at my col- 
lege indicated high statistical signifi- 
cance. The probability (P) for the best 
combination of scores empirically de- 
termined for our group was .00001. Ac- 
tually 20 out of 25 students considered 
seriously maladjusted by the psychia- 
trist had poor scores. When one consid- 
ers, however, that 139 students in the 
experimental group had similar poor 








12 JOURNAL OF CONSULTING PSYCHOLOGY 


scores and did not—so far as is known— 
show any particular maladjustment, one 
may wonder just how much help a P of 
.00001 is to the teacher in deciding 
which one of 7 students with poor scores 
he should worry about. Precision of 
statement of error for the group and 
repeatability of scoring procedures 
really offer no sort of scientific guaran- 
tee as regards the correctness of judg- 
ment in the individual case. 

Where the judgment of the examiner 
is included in the test procedure itself, 
the margin of error cannot be stated so 
precisely. The avowed purport of the 
projective test is not definitely delimit- 
ed, nor can scoring by a second examin- 
er be expected to yield exactly the same 
results. Certain uncontrollable scources 
of error are introduced. In actual test- 
ing of the individual, however, the state- 
ment that the test hopes to measure “‘in- 
troversion”, and exact repeatability of 
scoring, constitute so small a part of the 
total problem involved in effective evalu- 
ation of a person in a given situation 
that concentration on control of these 
points seems a rather meager version of 
scientific responsibility. Naturally the 
scientist wants a demonstration of the 
validity of his procedures, and will want 
to assure himself that positive results 
are ascribable to the instrument rather 
than to extraneous factors. Once this 
demonstration is made, however, it 
seems to me that the over-all correctness 
and usefulness of the results are the 
most important criteria. If the value of 
a test can be generally improved by al- 
lowing a competent examiner to use his 
training, it seems to me no less scientific 
to acknowledge this “uncontrolled” vari- 
able as a possible source of error than 
to control a relatively minor variable 
with neglect of other factors known to 
be of enormous importance —i.e., the 
constellation of trends in the individual 
subject, attention to data not yet re- 


duced to a formal score but of clear clin- 
ical significance, and the like. 

The preceding papers have taken the 
importance of these latter points for 
granted. I approach them more self- 
consciously because the group test 
is traditionally “standardized”, so that 
reliability and validity figures are almost 
synonymous with its scientific repute. 
Clinicians accustomed to the fuller in- 
formation and time for consideration 
available in individual testing view the 
quick handling of imperfect materials 
with just as great suspicion. My sug- 
gestion is that the philosophy of the 
group test might well preserve the ac- 
ceptance of a margin of error in the in- 
dividual case beyond the point at pres- 
ent tolerable for the responsible clini- 
cian, and also view more realistically 
the problem of control of error. I sug- 
gest that statistically precise reliability 
and validity figures do not sufficiently 
answer this problem, and that it is more 
important to make useful judgments 
with as few errors as possible than to 
“control” a small aspect of a measure 
in a manner which actually prohibits 
utilization of our best source of infor- 
mation. I suggest that safeguarding the 
individual subject from errors in evalu- 
ation depends more upon careful use of 
test results than upon establishing the 
customary statistical figures for a group. 

At this point I would like to discuss 
our experience with the Group Ror- 
schach at Sarah Lawrence College as an 
illustration of the philosophy of testing 
just described. The Group Rorschach 
has been administered to every enter- 
ing student since 1940. The examiner 
writes a descriptive sketch of each girl 
covering such aspects of her intellectual 
and emotional development as seem ob- 
servable with some security in quick 
work. For several years conditions of 
blind analysis were maintained and the 
accuracy of the sketches checked as ob- 
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jectively as possible. Teachers were 
given the sketches without names in 
groups of 4 or 5 and asked to identify 
the students intended. This they were 
able to with a degree of success far be- 
yond chance expectation.* Teachers 
were also asked to underline in black 
statements they considered correct and 
in red those they considered false. Out 
of 1,934 separate statements 143 (8 per 
cent) were underlined only in red and 
322 (17 per cent) in red by one teacher 
and in black by another. Errors in iden- 
tification seemed to be due mostly to 
middle-of-the-road descriptions of mid- 
dle-of-the-road students, or to a single 
especially vivid wrong or badly phrased 
statement in a long paragraph which 
seemed otherwise essentially correct. 
For the most part teachers accepted 
these sketches as reasonably accurate 
when they were told which student was 
intended. A few appeared definitely 
wrong in major respects, and almost 
every sketch contained dubious details. 

Perhaps one may say that the real 
check on the sketches began after the 
experimental period when they were put 
to the test of practical use. The consen- 


sus of opinion after several years seems: 


to be that the sketches are substantially 
correct and helpful. Obviously no state- 
ment is used in guidance which is not 
confirmed by firsthand cbservation, but 
the test frequently calls attention to 
aspects of the student’s psychology easi- 
ly missed by teachers or learned belated- 
ly. A simple example is the test revela- 


* Obtaining judgments on every freshman 
from every teacher who had enough freshmen 
in his class for such a “matching experiment” 
involved complications for which we could find 
no simple statistical answer. We have 8 sepa- 
rate Chi-squares differently calculated depend- 
ing on the number of judges per student, i.e., 
upon the number of courses a girl happened 
to take where there were at least 4 other en- 
tering students. All of the Chi-squares are 
good. The most accurate estimation seems be- 
tween 15 and 20, one degree of freedom— 
highly significant. 


tion of a deep uncertainty masked by a 
cocksure manner. An apparently inde- 
pendent and belligerent girl may re- 
spond far better to friendly but firm 
control than to the freedom of action 
she stridently demands or the discipli- 
nary putting-in-her-place which her 
manner tends to provoke. If the test 
remarks upon a bent of mind which 
fundamentally dislikes literal fact, the 
item may be useful in helping the stu- 
dent decide whether she should continue 
a major in science. Spontaneous curric- 
ular choice sometimes comes from a ro- 
mantic attachment to a scientist or from 
a glamorized view of science. Test de- 
scription of a bent of mind should not, 
of course, decide such an issue, but it 
may be a valuable ingredient to add to 
other data. The college is very careful 
to use the Rorschach results only in con- 
junction with other data, indeed prob- 
ably rejects rather too promptly com- 
ments which do not seem immediately 
intelligible.® 

The use of the Group Rorschach at 
the college may be compared with the 
use of various standardized tests, espe- 
cially the Bernreuter which was admin- 
istered over a period of eight years and 
carefully studied. For two years the 
Group Rorschach and the Bernreuter 
were both administered to every stu- 


5 Careful review of many cases indicates 
that some of the errors in the sketches are due 
to examiner-failure in quick evaluation, or that 
the interpretive slant suggested by extra-test 
data could modify a particular test comment 
and offer important information about the stu- 
dent not immediately observable to the teacher. 
A sizeable number of test comments initially 
rejected by teachers prove essentially correct 
in the long run. Time permitting, a more care- 
ful integration of test and other materials 
should be made for the best understanding of 
each case. Since time never permits such in- 
tegration for every case, the policy of the col- 
lege in rejecting statements which do not fit 
seems the most expedient. Of course this su- 
perficial use of test materials in the ordinary 
run of educational problems does not preclude 
pons intensive study of especially difficult stu- 

ents. 
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dent. As described elsewhere [1] the 
Rorschach was handled in a manner 
which yielded a quantitative general 
score for “adjustment” in addition to 
the qualitative sketch now under discus- 
sion. Checked in the same manner, this 
quantitative score proved more accurate 
than the Bernreuter scores on the same 
students. I think the most important 
moral to be drawn, however, is that the 
college actually uses the qualitative Ror- 
schach results whereas the quantitative 
scores mildewed in the files. A descrip- 
tive statement is much harder to deal 
with statistically than a single score, 
but it is much easier for a teacher to 
handle in practical judgment. A teacher 
is legitimately baffled by a poor score 
which may or may not be worth bother- 
ing about in the individual case. A de- 
scriptive evaluation of such assets and 
liabilities as he can observe himself in 
the student is on the whole more easily 
used and more easily criticized. Some 
teachers (or other “lay” users of psy- 
chological tests) will be overenthusiastic 
about the test statements or reject them 
with too little consideration. This phe- 
nomenon has also been observed in the 
handling of scores from standardized 
tests. (E.g.,Some colleges admit no ap- 
plicants who fall below a specified score 
on the routine intelligence tests; others 
pay little or no attention to the score.) 

If one’s aim in group testing is mere- 
ly useful application of “test” informa- 
tion, I believe that the projective meth- 
od with qualitative description deserves 
very serious consideration. Without for 
a moment suggesting that the projective 
methods at present can offer results of 
the calibre we would like in group test- 
ing, I think I can urge (a) that the 
practical possibilities of such testing are 
enhanced by allowing the trained ex- 
aminer to use his judgment in handling 
promising test materials, and (b) that 
a descriptive account adapted to the spe- 


cial requirements of a particular situa- 
tion is on the whole handled more ap- 
propriately by the “lay” user than a 
single score where a large margin of 
error is precisely stated in statistical 
terms irrelevant to judgment in the in- 
dividual case. 


THE QUANTIFICATION OF PROJECTIVE 
METHODS 


Thus far I have discussed the possi- 
bilities of group projective testing for 
the purpose of individual evaluation in 
the hands of thoroughly trained psy- 
chologists and ultimate consumers (e.g., 
teachers) sufficiently informed for sen- 
sible, critical application. I believe that 
this purpose is potentially of widespread 
value if on the one hand psychologists 
can be trained to orient their work to 
the special needs of the practical situa- 
tion, and on the other the ultimate con- 
sumers can learn more about how per- 
sonality factors affect the concrete per- 
formance which is their special pur- 
view. Far from despairing of the end- 
less complications of this mutual learn- 
ing process, I think it the main hope of 
the future. It is being instrumented 
already by forces beyond the scope of 
the test-maker. 

For some purposes, however, for some 
large-scale screening jobs and for cumu- 
lative research, further codification of 
the projective methods is essential. Test 
procedures must be described with suf- 
ficient definition for exact repeatability 
under varying conditions. Test results 
must have a less cumbersome format 
than pages of description to allow for 
feasible summation of materials con- 
cerning group trends drawn from many 
individual records. These requirements 
necessarily entail very serious loss in 
accuracy and richness of interpretation 
for the individual case. Hence the im- 
portance of keeping the aim of testing 
clearly in view. Projective methods may 
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be used in one way for maximal insight 
into the individual, and in other ways 
for learning about common trends 
among groups of subjects. 

Careful analysis of the special prob- 
lems of quantification is not germane to 
the present symposium. I cannot for- 
bear mention of one general point, how- 
ever, Projective test literature, espe- 
cially the Rorschach literature, is full 
of quantitative studies, many of which 
have yielded very disappointing results, 
some of which seem very promising. By 
and large it seems clear that poor re- 
sults are obtained when a limited num- 
ber of the traditional scores (which 
were never intended for use in isola- 
tion) are equated with personality traits 
without further ado and correlated with 
external factors. Promising results are 
obtained when combinations of scores 
are used which are known clinically to 
have some stability of meaning from 
one individual to the next, or when the 
clinically experienced examiner observes 
certain recurrences in the performance 
of special groups which are not gener- 
ally encountered. Such observations 
may often be reduced to quantifiable 
terms and statistically analyzed, even 
though they may involve aspects of test 
performance usually not formally scored. 

In short, clinical knowledge of the in- 
strument must be used to suggest likely 


lines of quantification. It can also offer 
some clue to the meaning of statistically 
significant group differences as actually 
observed. When the judgment of the 
skilled examiner is applied in setting up 
large-scale research, much of the re- 
sourcefulness and flexibility of the pro- 
jective method may be retained in spite 
of the limiting requirements of quanti- 
fication. It seems to me most unfortu- 
nate that so many investigators confine 
their research efforts to careful tabula- 
tion of familiar items instead of using 
their knowledge of the complexities of 
individual performance to select quan- 
tifiable data of deeper import, less sub- 
ject to fluctuation in significance from 
case to case. 

Even in quantitative research the two 
basic aspects of the projective method 
may be retained, complexity of the sub- 
ject’s product, and acceptance of the 
complexity in evaluative procedures. 
The simplification necessary for objec- 
tive mathematical treatment can capi- 
talize clinical experience in determining 
fruitful lines of codification. 


REFERENCE 


1. Munroe, R. L. Prediction of the adjust- 
ment and academic performance of college 
students by a modification of the Rorschach 
method. Appl. Psychol. Monogr., No. 7. 
Stanford Univ., Calif.: Stanford Univ. 
Press, 1945. 








THE APPRAISAL OF CHILD PERSONALITY’ 
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AM AVOIDING the word diagnosis 
because I am not dealing with diag- 
nostic problems in the usual sense, 
which implies that one is differentiating 
a neurotic or psychotic group. I am con- 
cerned with the everyday problem of the 
school psychologist whose job it is to 
help teachers understand the abilities, 
potentialities, blocks, and anxieties, of 
the twenty to fifty per cent of school 
children who at one time or another 
need more than everyday classroom 
help. 

Since the appearance of the Stanford- 
Binet test the starting point of the 
school psychologist has been the Binet, 
and the I1.Q. has been a major guide- 
post in getting a sense of direction about 
a child’s educational capacity. This will 
remain true for a long time although we 
are more and more recognizing that 
with large numbers of children of ade- 
quate intelligence, personality problems 
are more relevant to their learning diffi- 
culties than are limitations of intelli- 
gence. The Rorschach, Thematic Apper- 
ception Test and paper-and-pencil tests 
like the Rogers provide tools for search- 
ing more deeply into the drives and pre- 
occupations that affect the child’s work 
in school. But tests like the Rorschach 
and the TAT are so time-consuming to 
score and interpret that they cannot be 
used in their present form with the 
large numbers of normal children who 
need help, although children whose dif- 


1 Paper read at a Round Table on Diag- 
nostic Testing, American Psychological Asso- 
ciation, Philadelphia, Pa., September 6, 1946. 
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ficulty is great enough to receive agency 
help may receive a day or more apiece 
of intensive study by such means. We 
do not yet have an adequate study of 
the possibilities of Munroe’s inspection 
technique as a short-cut in the Ror- 
schach study of children; nor do we 
know enough about the relationships be- 
tween different sorts of Rorschach find- 
ings and different kinds of educational 
success and difficulty, to make the edu- 
cational use of the Rorschach fully re- 
warding. 

Pending such developments it can be 
helpful to make as full qualitative use 
of the Binet as possible. This may pro- 
ceed along three lines: content analysis, 
form analysis of drawings, and distri- 
bution of successes and failures. 

Much of the verbal material can be 
useful for content analysis. The words- 
in-one-minute test is of course one kind 
of free-association test. With practice 
the words, pauses, groupings, etc, can 
be recorded. Doubtless any experienced 
tester could give illustrations like the 
following: a girl who had poor relations 
with children gave a list of objective 
words from her immediate surround- 
ings much as other children do, but in- 
terpolated the group crazy-blame-fear. 
Discussion with her teacher revealed 
that the children had been calling her 
crazy because of her excessively shy de- 
tached behavior. A tense boy who alter- 
nated between very controlled behavior 
and explosions of aggressiveness toward 
other boys gave sequences in which am- 
bitious adult abstract words alternated 
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with words of violence, destruction, and 
hostility. In such instances, the verbal 
material serves to highlight the behav- 
ior of the child and reveal the depth of 
emotional patterns which might other- 
wise be taken less seriously than they 
deserve. Definitions, abstract words, 
etc. are important to watch: one child 
of average I1.Q. gave definitions in high- 
ly sensory and colorful terms suggesting 
artistic possibilities which were later 
confirmed but had not previously been 
suspected. 

The areas of clarity in definition may 
be indicative; the child who can define 
“obedience” but not “defend” is giving 
us hints about his relations to adults 
and to children which should be fol- 
lowed up. 

Reading memory distortions may be 
revealing, as also inadequate picture in- 
terpretations. For example the child 
who comments on Colonial Days that 
the man shouldn’t have a gun to shoot 
because it is wrong to shoot, is giving 
a moralistic interpretation which ig- 
nores the life and death reality problem 
of the picture; such unrealistic moraliz- 
ing would make the child’s social rela- 
tions difficult if it were generally char- 
acteristic of him. 

In studying the content of the full 
Binet record it is of course important 
to watch for repetitions and congruent 
deviations. One swallow doesn’t make a 
summer, but a flock of birds flying in 
the same direction usually means some- 
thing. 

Form analysis of an elementary sort 
may be done with all drawings produced 
by the child before, during or after the 
test; the paper cutting drawings, draw- 
ings made for the memory-for-designs 
test, the purse lost in the field, together 
with a spontaneous drawing or two col- 
lected at the beginning and end of the 
test, all offer records of the child’s 
graphic patterns. Oversensitiveness to 


limits, overimpulsiveness, etc. may give 
clues related to what is found in the con- 
tent analysis. 

The distribution of successes and fail- 
ures is inspected for consistent and in- 
consistent patterns. Occasionally a child 
will pass high tests where he has failed 
lower ones; while this may sometimes 
be due to learning that has taken place 
during the test, it also appears in chil- 
dren who are bored by easy material 
and stimulated by harder material. 
When a child succeeds consistently with 
digits and other precisely defined tasks, 
and fails in tests requiring insight, it is 
worth asking whether he is dependent 
on authority and precise tangible accom- 
plishments. Where the opposite pattern 
appears, anxiety may have stimulated a 
concentrated effort to understand and 
deal with social relations at the expense 
of routine learning. More study of the 
meaning of such patterns in the total 
picture of a child’s personality is needed. 

As in any personality appraisal, the 
total picture is more important than 
separate items taken alone. The child 
who overcautiously stays within limits 
on the drawing tests, leans on digits, 
shows poor insight, defines “obedience” 
but not aggressive words, may be giv- 
ing important clues to a personality 
structure; he may be anxious about au- 
thority to the extent of inhibition of 
normal childish spontaneity, and this 
would affect work as well as social rela- 
tions in a modern school. 

The assumption here is, of course, 
that mental functioning cannot be di- 
vorced from personality structure and 
that much can be learned about person- 
ality from any sample of intellectual be- 
havior and vice versa. More and more 
evidence of this interrelation has been 
presented in recent psychological litera- 
ture: low test scores of institutionalized 
children, records of improved I.Q. after 
therapy, changes in I.Q. related to emo- 
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tional adjustment, the work of F. L. 
Wells at Harvard on personality of stu- 
dents of different achievement, and the 
recent studies of diagnostic implications 
of intelligence tests at Menninger clinic. 
This assumption, that intellectual ap- 
proach and personality are intimately 
related, also underlies the various meth- 
ods of analyzing painting (e.g. Schmidl- 
Waehner) and drawing (e.g. Werner 
Wolff) play technique, and stories. In 
the case of children under ten, many 
schools can give the tester a sizable col- 
lection of pictures and stories which can 
supplement the material the tester is 
able to collect in her limited time. 
While our approach to the appraisal 
of child personality has been enormous- 
ly enriched by the development of pro- 
jective techniques we are still very lim- 
ited at two points. The child who does 
not “give” on any of these approaches 
calling for fantasy may be the most diffi- 
cult to understand on the basis of overt 
behavior as well. Below the age of sev- 
en, the TAT is decreasingly useful, 
while the Rorschach is shaky before 
four. Yet crucial decisions are made 
about preschool children: whether a se- 
verely retarded child should be institu- 
tionalized, whether a child showing dis- 
turbed behavior should have therapy or 
not. The children who most need care- 
ful appraisal are often the hardest to 
evaluate, since the same children who 
do not “give” freely on the projective 
tests may also be very inhibited on in- 
telligence tests, quite aside from the 
generally accepted fact that intelligence 
tests themselves are less dependable 
during the preschool period. We have 
been too slow to think through the im- 
plications of data like that of Despert, 
who reports increases in I.Q. for chil- 
dren who are experiencing an improve- 
ment in their emotional situation. In 
the Sarah Lawrence Nursery School we 
have had similar cases, and in addition 


an instance of a child who, with a pre- 
school 1.Q. of 65 was recommended for 
institutionalization, but who now at the 
age of 12 has an LQ. of 147 after steady 
increases from year to year as he be- 
came more adjusted. Psychologists have 
been responsible for the development in 
our culture of a rigid and not entirely 
sound way of classifying children: the 
concept of the 1.Q. has percolated to the 
lay public, parents as well as teachers, 
so thoroughly that the opportunities 
open to a child may often be limited by 
his achievement on one test at one 
point in time, space and emotional ad- 
justment. We have assumed on the bas- 
is of statistical reliabilities that the I.Q. 
is a measure not only of present status 
but also of potentialities. Undoubted- 
ly it is, a considerable part of the time, 
when a child is optimally adjusted and 
is giving optimal cooperation. But too 
many evidences of changes from year 
to year due to changes in these two 
matters have appeared in problem child- 
ren to make us secure in our prognosis. 
We must help to build more flexible 
attitudes toward children, their possi- 
bilities and their needs, in order to 
counteract the tendency to pigeon-hole 
a growing changing personality prema- 
turely. This is likely to be even more im- 
portant in the future, since the increas- 
ing tendency to leave children in “safe” 
playpens, kiddie-coops, the increasing 
numbers of infants in families with no 
other children to give stimulation, and 
of mothers who do not understand the 
infant’s need for mothering, appears to 
be responsible for the emotionally de- 
prived preschoolers who sometimes ap- 
pear to be quite retarded despite a “good” 
heredity and environment. Such chil- 
dren, with a general backwardness, or 
specific retardation of verbal, motor, or 
social development can be stimulated to 
normal development with special atten- 
tion or therapy. It is the psychologist’s 
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task to recognize potentialities as well 
as present performance; there is room 
for more work on basic problems in the 
interrelation of mental activity and per- 


sonality development as a background 
for improvement in our ways of ap- 
praising the sorts of children who are 
now misjudged. 
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HE K SCALE of the Minnesota 

Multiphasic Personality Inventory 
(MMPI) was developed in an attempt 
to correct the scores obtained on the 
personality variables proper for the in- 
fluence of attitudes toward the test situa- 
tion. The rationale of the approach as 
well as the empirical procedure em- 
ployed in deriving K has been presented 
in a previous publication [7] and will 
not be deait with here except very sum- 
marily. The present paper is to be read 
as a sequel to the original and aims 
chiefly to present norm data on K for 
various groups, an improved technique 
for applying K statistically, and certain 
miscellaneous observations such as its 
effects on the validity and intercorrela- 
tions of the other scales of MMPI. 


The K scale was derived by studying 
the item response frequencies of certain 
diagnosed abnormals who had normal 
profiles. It was here assumed that the 
occurrence of a normal profile was sug- 
gestive of a defensive attitude in the pa- 
tient’s responses. The response frequen- 
cies were contrasted with those from an 
unselected sample of people in general 
(“normals”). The differentiating items 
were then scored so that a high K score 
would be found among abnormals with 


1 Supported by graduate research grants 
a os University of Minnesota Graduate 
ool. 


2 Doctor McKinley’s name appears here in 
honorary recognition of the fact that his last 
research work before he became disabled was 
in — with the development of the K 
scale. 
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normal curves, whereas a low score 
would be found in clinical normals hav- 
ing deviant curves. In this operational 
sense, it can be said that a high K score 
is indicative of a defensive attitude, and 
a low K score suggests unusual] frank- 
ness or self-criticality (“plus-getting”’). 
The extremes of defensiveness and plus- 
getting may be called “faking good” and 
“faking bad” respectively. 

The earlier procedure for applying K 
was one of subjectively correcting pro- 
files on the basis of K score. Thus, a 
given borderline curve would be “‘under- 
interpreted” if K was considerably be- 
low the mean, since the examinee would 
be presumed to have achieved a bad 
curve because of his plus-getting ten- 
dency. If the same profile occurred in 
the presence of an elevated K, the clini- 
cian would assume that the curve ought 
to be “over-interpreted,” since the ex- 
aminee showed evidence in his high K 
of having been defensive. 

In the following presentation we will 
first give the more practical data refer- 
ent to the routine use of K. Following 
the description of the determination of 
K correction factors and specific data 
on validity we will return to the more 
general facts bearing on clinical inter- 
pretation integrated with the whole pro- 
file. 

The original method of using K was 
admittedly vague and inspectional, and 
would require considerable experience 
on the part of the individual clinician. 
It was clear that the influence of the K 
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factor upon scores was not the same for 
all MMPI variables, so that the optimal 
interpretation of the personality scales 
proper on the basis of a given K devia- 
tion varied. It is obvious that the 
amount of experience required to make 
a satisfactory use of K in profile inter- 
pretation would be very great, even as- 
suming that the clinician would be able 
subjectively to record, retain, and ana- 
lyze the welter of impressions with ref- 
erence to the nine personality compo- 
nents. For this reason, it seemed that a 
more rigorous and objective procedure 
for taking account of the K score would 
be desirable. 

Since high K scores represent the de- 
fensive or “fake good” end of the test 
attitude continuum, the most obvious 
approach to the problem is to add K (or 
some function of K) to the raw score on 
each personality variable, i.e. increase 
the score in the direction of abnormal- 
ity. Thus, a psychopath who is very de- 
fensive in taking the test is presumed 
to have attained a lower raw score on 
the Pd scale than he “should” have, i.e., 
than he would have had he been less de- 
fensive. This defensiveness will also 
tend to reflect itself as a high K score. 
The obtained score on Pd should accord- 
ingly be corrected by adding some 
amount, the amount added being depen- 
dent upon the degree of defensiveness 
present as indicated by K. The problem 
is simply one of determining the opti- 
mal weight for the K factor with respect 
to any given scale, taking a linear func- 
tion as an adequate approximation for 
practical purposes. 

Our first attempt was crude in that it 
treated K as what may be called a 
“pure” suppressor, whose only contri- 
bution lay in its correlation with the 
noncriterion components of the person- 
ality variable [5, 6]. In a preliminary 
study of the Hs scale, using an unusu- 
ally carefully selected group of diag- 


nosed hypochondriacs, the Hs score 
was increased by a fraction of K pro- 
portional to the regression weight of Hs 
on K among the normals. In other 
words, in place of Hs alone we now 
were using the residual of Hs regressing 
on K, i.e., that part of Hs which is K- 
independent. This procedure is inexact 
since it assumes that K itself is uncor- 
related with the dichotomous criterion, 
and also because it neglects the correla- 
tion of Hs with K among the abnor- 
mals. In spite of this crudeness, it was 
encouraging to find that the corrected 
Hs score now enabled us to detect 89% 
of the hypochondriacs as contrasted 
with about 70% of the same sample us- 
ing Hs alone. This separation was 
achieved on a test group which had not 
entered into the derivation of either Hs 
or the K-weight, and involved no in- 
crease in the numer of false positives 
among normals (about 5% in both 
cases). 

The desirability of taking account of 
the correlation of K with the person- 
ality scales both among normals and ab- 
normals, as well as any differentiating 
power of its own which K might have 
on certain sorts of cases, suggests the 
use of the discriminant function for de- 
termining the optimal weight. In the 
present problem, the variances among 
normals and abnormals were not always 
alike, nor was it convenient to restrict 
our analysis to the usual case of equi- 
numerous groups. We experimented 
with a modification of the discriminant 
function which added variances rather 
than sums of squares, but decided to 
reject this also for the following reason: 
The region in which differentiation is 
clinically most important is around 60 
to 80 T-score. There is little or no basis 
at present for interpreting the person- 
ality scores which are below the mean. 
All methods which are based upon maxi- 
mizing the ratio of the variance of cri- 
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terion group means to some type of 
pooled variance within groups will be 
taking account of the entire distribu- 
tion. This results in a K-weight based 
upon information which there seems to 
be no reason to include. The skewness 
of MMPI variables and the obvious 
doubts one might have as to the influ- 
ence of K at different points of the dis- 
tribution led us to determine the opti- 
mal weight by a study of a more re- 
stricted region, within which refinement 
was of greatest consequence. It is un- 
fortunate that this decision entails pro- 
cedures which are mathematically inele- 
gant and in sore need of analytic justi- 
fication but we have not been able as 
yet to devise acceptable alternatives. It 
is hoped that the procedure now to be 
described will seem reasonable, and that 
others will attempt a formally simple 
solution and will study the sampling 
distribution of the test employed. In the 
present case, there is reason to suspect 
that a general maximizing solution is 
impossible without making assumptions 
regarding distribution form which are 
empirically inadmissible. 

Consider a given personality variable, 
represented in deviate score form by z, 
where the deviation is from the mean 
of normals. Let the K deviate score be 
represented by z. Let 4 be an arbitrary 
weight, whose optimal value is to be de- 
termined. Optimal value refers here to 
the 4 which achieves the best differen- 
tiation between a criterion group of ab- 
normals diagnosed as having the ab- 
normality in question (e.g. hypochon- 
driasis) and a sample of unselected 
normals. In other words, we are here 
considering the personality variables sin- 
gly, by specific diagnosis, rather than 
“abnormals” as a whole. Then the devi- 
ate corrected score on the given abnor- 
mal compenent is 


y=xz-+iz 


Let us now restrict our attention to 
the cases scoring above the mean of 
normals on y, i.e., consider only cases 
such that x + 42z> 0. We now define 
a sum of squares for those abnormals 
whose corrected score is above the nor- 
mal mean. That is, for cases such that 
x +iz2z> 0, we define a sort of “half 
sum of squares,” 


S S,=2.(y)? = 2.(4 +12)? 


The same quantity is computed for the 
normals, 


SS, = 2n(y)? = 2, (az + 12)? 


The ratio of these two sums of squares, 
which we shall call the differential ra- 
tio, 

SS, 2,(2+A2)* 


SS, 2,(x+Az)? 


is then taken as an index of the degree 
of differentiation achieved by a given 
value of i. 

It can almost be seen by inspection 
that a straightforward analytic solution 
for the optimal 4 cannot be carried 
through by maximizing this ratio, since 
the number of cases involved in numer- 
ator and denominator will occur in the 
resulting derivative and will itself fluc- 
tuate with the choice of a 4 in a manner 
that cannot be known without special 
specifications of the joint distribution 
of x and z. Even if special assumptions 
are made, such as normal bivariate sur- 
face and equal correlation for normals 
and abnormals (neither being true in 
this sort of material), the solution of 
the problem presents serious mathe- 
matical complications. We hope to be 
able to make further progress in this 
direction and invite more mathemati- 
cally competent readers to attack the 
general case. We fell back upon a 
straight trial-and-error method. We as- 
signed arbitrary vaules of 2 (= .1, .2, 
.3, .4, ete.) and for each of these values 
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we distributed y for normals and cri- 
terion cases separately. The ratio 
SS,/SS, was then calculated for each of 
these 4 values, and these ratios plotted 
as a function of 4. A smooth curve was 
drawn by inspection through the plotted 
points, and a rough maximum was esti- 
mated therefrom. Where several differ- 
ent samples of abnormals were avail- 
able, such curves were drawn separately 
for each, in the hope of having more 
confidence in the estimated maximum on 
the basis of agreement in curve “trend.” 

One further qualification needs to be 
mentioned. Since squares emphasize ex- 
treme deviations, and in view of what 
has been said above concerning “clini- 
cally important range,” it was felt de- 
sirable to limit the influence of extreme 


deviations upon the ratio. Therefore, 


after the distribution of y for a given A 
had been obtained, all scores of the nor- 
mal and abnormal groups lying above 
three standard deviations on the basis 
of a given x + A z normal distribution 
(corrected T score of 80) were arbi- 
trarily reduced to that value. A change 
in 4 which produced further elevations 
of abnormals already at three sigma 
would therefore not result in further 
improvement in the differential ratio. It 
is possible that four sigma should have 
been chosen instead, since recent work 
on pattern analysis in differential diag- 
nosis among abnormals suggests that 
elevations above three sigma may be im- 
portant. In fact, we would not be pre- 
pared to vigorously defend the use of 
this restriction at all. 

The graphical method used gave op- 
portunity to observe the behavior of the 
differential ratio as 4 was varied, and 
to check the degree of disparity with 
other indicators of separation. In gen- 
eral, it was found that the 4 which maxi- 
mized the ratio tended to agree fairly 
well with that selected by such measures 
as per cent abnormals above the top 
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decile of normals. Research on a differ- 
ent problem suggests that the d.r. gives 
results similar but not identical with the 
critical ratio. In the present study, the 
2’s finally chosen were sometimes based 
upon compromises between the curve 
maxima of d.r. for various criterion 
groups, as well as counting measures of 
overlap. 
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Fic. 1. Values of the “differential ratio” as 
a function of the size of the K-weight (A), 
for the Sc scale. 


Figure 1 shows typical data on the 
d.r. as applied to Sc + 4 K. Groups I 
and II are composed of 25 and 28 males 
diagnosed schizophrenia and groups III 
and IV of 24 and 14 female cases re- 
spectively. There were some minor dif- 
ferences in the clinical constitution of 
the four groups but since the curves 
were similar in maximum points these 
differences can be disregarded. From 
these data we chose the 4 weight for Sc 
to be 1K. 

TABLE I 


THE K WEIGHTS OF THE SCALES AFFECTED 
BY THE K CORRECTION 


Hs+ 5K 
Pd+ AK 
Pt+10K 
Se + 1.0K 
Ma+ 2K 
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Table I gives the K-weights which 
were finally adopted by these proce- 
dures. It must be emphasized that these 
weights are optimal, within our sample, 
for the differentiation of largely in-pa- 
tient psychiatric cases of full-blown psy- 
choneurosis and psychosis from a gen- 
eral Minnesota “normal” group. For 
other clinical purposes it is possible that 
other 4-values would be more appropri- 
ate. Thus, it seems likely that for the 
best separation of “maladjusted nor- 
mals,” such as those which abound in a 
college counseling bureau and would be 
formally diagnosed in a psychiatric clin- 
ic as simple adult maladjustment, other 
weights might be better. 

The mode of applying these weights 
has been described already in the sup- 
plementary manual for the MMPI pub- 
lished by The Psychological Corpora- 
tion. This manual contains a set of 
tables to be used in making the K-cor- 
rection, and new test blanks are also 
available to be used with K. Briefly one 
determines the weighted K-value by re- 
ferring to the table, which is based di- 
rectly upon the proportions just cited. 
Thus, in correcting Hs for K, one begins 
by determining .5K either mentally or 
from the table (K here is the raw 
score). This quantity is then added to 
the original raw score on Hs, to yield 
Hs + .5K. This sum is called the cor- 
rected raw score on Hs. This corrected 
raw score is then entered in a second 
table of Tc (corrected T scores). This 
T table is of course based upon the mean 
and SD (on general normals) of the 
quantity Hs + .5K. Similar processes 
are involved in the case of the other 
scales. 

It will be noted that only five scales 
receive a K-correction. The scales D, 
Hy, and Pa are uncorrected. (Mf was 
not studied in this respect.) It may 
seem paradoxical that in the original ar- 
ticle on K, we found only three scales 


on which K would “work.” This finding 
was based on a crude test using a single 
cutting score and a chi-square analysis. 
In the case of Pt, the sample studied 
originally was very small and signifi- 
cance hard to attain. The present K 
weights are to be accepted as the best 
indication of the “improvement” 
achieved by K, rather than the chi- 
squares cited previously. 

We see then that Pt and Sc are most 
affected by K, Hs and Pd mediumly, 
Ma least and D, Hy, and Pa not at all. 
It is interesting to speculate upon this 
hierarchy. It will be recalled that the 
scale D already contains a correction 
scale, called Cd [4]. Hy already con- 
tains the K-factor in the Hy-subtle 
items [7, 8]. While Pa has no explicit 
correction scale, almost one-fourth of 
the Pa items are of the Hy-subtle type 
(eight being actually “O” items on the 
scoring key). These items, with the in- 
dicated response scored for Pa, are as 


follows: 
B-54 “My mother or father often made me 
obey even when I thought that it was 
unreasonable.” (F) 

“Most people inwardly dislike putting 
themselves out to help other people.” 
(F) 

“Most people are honest chiefly through 
fear of being caught.” (F) 

“I think most people would lie to get 
ahead.” (F) 

“I think nearly anyone would tell a lie 
to keep out of trouble.” (F) 

“Most people will use somewhat unfair 
means to gain profit or an advantage 
rather than to lose it.” (F) 

“The man who provides temptation by 
leaving valuable property unprotected is 
about as much to blame for its theft as 
the one who steals it.” (F) 

“Some people are so bossy that I feel 
like doing the opposite of what they re- 
quest, even though I know they are 
right.” (F) 

“I tend to be on my guard with people 
who are somewhat more friendly than 
I had expected.” (F) 


D-46 
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TABLE Il 


THE EFFECT OF THE FINAL K CORRECTIONS ON TEST CASE GROUPS CONTRASTED TO A 
STANDARD SAMPLE OF 200 NORMAL CASES. 


The values given are the per cent of cases at or above the given T score points. 





Hs 


Hs + .5K 


Hs Hs + .5K 
200 101 Hs Test 200 101 Hs Test 
T Normals Cases = Normals Cases 
70.0 3 59 70.0 5.5 74 
69.8 5 62 70.3 5 72 
65.0 10 69 62.9 10 89 
Hy Hy Hs Hs 
200 101 Hs Test 200 74 Hy Test 
7 Normals Cases T Normals Cases 
70.0 4 64 70.0 3 388 
67.5 5 74 69.2 5 42 
63.3 10 79 65.0 10 55 
Hs + .5K Hs + .5K Hy Hy 
200 74 Hy Test 200 74 Hy Test 
T Normals Cases :Y Normals Cases 
70.0 5.5 54 70.0 4 53 
70.3 5 51 67.5 5 62 
62.9 10 69 63.3 10 66 
Pd Pd Pd + .4K Pd + .4K 
200 89 Pd Test 200 89 Pd Test 
z= Normals Cases = Normals Cases 
70.0 5 52 70.0 3.5 55 
70.0 5 52 67.5 5 65 
65.0 10 65 62.7 10 76 
Pt Pt Pt + 1K Pt + 1K 
200 36 Pt Test 200 36 Pt Test 
T Normals Cases T Normals Cases 
70.0 6.5 42 70.0 4 61 
71.5 5 40 68.5 5 67 
67.2 10 47 64.0 10 67 
Se Se Se + 1K Se + 1K 
200 91 Sc Test 200 91 Sc Test 
7 Normals Cases T Normals Cases 
70.0 4.5 31 70.0 2 59 
69.0 5 31 64.0 5 69 
62.5 10 43 61.2 10 15 
Ma Ma Ma + .2K Ma + .2K 
200 89 Ma Test 200 89 Ma Test 
T Normals Cases = Normals Cases 
70.0 8 62 70.0 2.5 65 
66.3 5 72 65.7 5 74 
61.8 10 79 63.1 10 84 
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If one calculates the percentage of 
“OQ” items for each of the eight person- 
ality scales (excluding Mf), the propor- 


tion of such items per scale is Hs = 0%, 
Sc = 3%, Pt = 4%, Ma 15%, 
Pd = 16%, Pa 20%, D 27%, 


Hy = 33%. 

These figures at least suggest that the 
proportion of zero items per scale tends 
to be negatively associated with the K- 
weight found to be optimal. One way 
of looking at this finding is to say that 
scales which are more subtle are less 
subject to distortion by such test-atti- 
tudes as K, and hence cannot be im- 
proved much by application of a K-cor- 
rection. It cannot be decided on present 
evidence whether this is the correct 
view rather than the view that the sub- 
tle items, although not derived as sup- 
pressors, already contain “suppressor” 
components for the obvious items. 


THE EFFECT OF THE K CORRECTION ON 
VALIDITY AS RELATED TO DIAGNOSIS 


Table II gives an idea of the diagnos- 
tic effect achieved by the K correction. 
The cases designated “test cases” were 
not always clear cases of the given 
diagnostic category but represented pa- 
tients who were noted by the psychiatric 
staff as being at least in part charac- 
terized by traits belonging to the cate- 
gory. Hence it is probably fair to as- 
sume that the percentages of these cases 
lying above the three given T values are 
smaller than would be true of more care- 
fully selected patients. The 200 stand- 
ard sample normal records used as ref- 
erence were made up of 100 males and 
100 females from the general normative 
files who were specially selected to be 
representative of the whole population. 

For Hs and Hs + .5K, the data are 
given on both Hs and Hy test groups. 
The figures for these groups as distrib- 
uted by Hy are also included. (See also 
Table III.) One may compare not only 


TABLE Ill 
COMPARISON OF THE ACTION OF HS 5K AND 
Hy on Test CASES DIAGNOSED PSYCHO- 
NEUROSIS, HYPOCHONDRIASIS AND 
PSYCHONEUROSIS, HYSTERIA 


Hy Test Cases 
Per Cent With 


Hs Test Cases 
Per Cent with 


T score 70 T score 70 

and above and above 
On Hs 5K alone 16 9 
On Hy alone 6 12 
On both scales 58 41 


the Hs with Hs + .5K but also Hs + 
.5K with Hy. It is apparent from Table 
II and from correlational data that the 
addition of K to Hs makes it act more 
like Hy. This could be predicted from 
the communality of K and Hy-subtle 
[7]. In terms of the group data of 
Tables II and III, one is justified in us- 
ing both scales. As was argued in an 
earlier publication [8] clinical evidence 
is at present in favor of the continued 
use of both scales because they are com- 
plementary when operating in the in- 
dividual case. For example, Table III 
indicates that the joint use of both scales 
results in the identification of more hy- 
pochondriacs and hysterics than would 
the use of either separately. We hope 
soon to publish further data relative to 
the clinical significance of the two scales 
used together. 

The gains for Pd AK over Pd are 
most marked at the 5th and 10th per- 


centiles. This results from a flattening 
of the frequency curve for the normals 
in the range of 60 to 70 T-score. We 


have already tended to interpret Pd as 
having clinical significance at around 
T = 65 when it appears as a clear 
“spike” or when certain other values 
(especially the neurotic triad) are be- 
low 50. The above data probably add 
justification to this interpretation. 
The increased validity of Pt + 1K is 
a function of both increased normality 
in the frequency curve for normals and 
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relatively higher scores for the test 
cases. The Pt + 1K is likely to be more 
clearly a “clinical” scale than was the 
Pt. We have pointed out [7] that Pt is 
a rather good measure of the K factor 
and one would expect partial removal 
of this variance to result in a remainder 
“purer” for the real clinical component. 
Good clinical data on psychasthenia 
relatively independent of schizophrenia 
are difficult to obtain and we can give 
no further evidence at this time. 

Se was never a very satisfactory 
scale in terms of the number of schizo- 
phrenic patients identified, although 
when it is elevated Se is quite valid. 
When Sc + 1K is used, a very gratify- 
ing improvement is apparent. These 
gains with a K correction are from all 
standpoints the best of the five scales. 

The improvement of Ma .2K over 
Ma is not great but if the effect upon 
the frequency curve for normals is com- 
bined with that on the test group, it is 
definitely worthwhile to use the correc- 
tion. Ma is the most common single devi- 
ate score in both high and low directions. 
Among the profiles of unselected nor- 
mals, Ma occurs as a “peak” score more 
often than does any other scale, and it 
also occurs as a lowest score more often 
than any other. This is presumably a 
statistical consequence of the fact that 
Ma correlates with the other scales less 
than they tend to correlate among them- 
selves. Ma probably has more indepen- 
dent clinical significance than any other 
single scale. These facts add to the im- 
portance of any gains in validity. 


THE GENERAL INTERRELATIONSHIP OF K 
AND K CORRECTED SCALES 


Table IV shows the means and stand- 
ard deviations for various groups. We 
attribute no certain significance to the 
variations that can be observed in these 
statistics. The normals designated in 
this table are the general normals that 


‘separate age groups 


THE K SCALE 27 


TABLE IV 


Tue K MBANS AND STANDARD DEVIATIONS 
OF VARIOUS GROUPS 


No. Sex Mean Sigma 
Normals age 16-25 inc 115 F 12.61 4.96 
Normals age 16-25 inc 7 M 79 5.27 
Normals age 26-35 inc. 153 k 2 
Normals age 26-35 ine. 105 M 12.41 5.85 
Normals age 36-45 ine 105 I 10.41 4.60 
Normals age 36-45 ine 69 M 12.49 5.14 
Normals awe 16-45 inc 373 F 12.08 5.07 
Normals age 16-45-inc. 247 M 12.84 5.64 
Mixed Psychiatric 372 M 14.57 5.85 
Mixed Psychiatric 596 F 14.34 §.21 
University 5 M 16.10 15 
University § I 15.€ 
University (Drake 

Wisconsin) 7 F l 4.20 
High School (Capwell) 7 I 14.96 5.46 
Reform School (adolescent ) 

(Capwell) RR k vf 99 
Reformetory (adult) (Capwell) 4 k 14.18 4.46 
Graduate Electrica! Engineers 

(Minneapolis-Honeywell) 10¢ M 16.72 4.19 
Miscellaneous Employed 

(American Airlines) 104 I 15.38 


have been described elsewhere in publi- 
cations on the MMPI as a reasonably 
satisfactory cross section of Minnesota 
residents. While there are several pos- 
sible sex difference trends as seen in the 
a grand compila- 
tion of all these normal males contrasted 
to all of the females shows no ap} 
able differentiation. 

The two groups referred to as “mixed 


rahiatwi 
psycniati 


yreci- 


%” 


.”’ included all diagnoses ob- 
served in the psychiatric unit and are 
not necessarily typical of a psychiatric 
hospital of the usual type. Many of the 
patients presented behavior problems of 
types that would not be committed to an 
institution for the insane and in general 
the group would be a borderline group 
between the obviously psychotic and the 
normal. The moderate rise in the means 
for these groups is chiefly contributed 
by the psychopathic personality and 
criminal individuals who would make up 
about 20% Uni- 
versity students have a relatively higher 
mean as contrasted to general normals 
of their age range. An interesting point 
is evident in the means for the Capwell 


of the whole number. 
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[1] girls. Here the reform school cases 
obtain a higher mean than otherwise 
similar adolescents in high school. This 
low mean is contradictory to the ten- 
dency that we observed in adult offend- 
ers which is illustrated by the reforma- 
tory women whose mean score is some- 
what higher than the general norm. It 
is of interest in this connection that a K 
correction slightly decreased the differ- 
entiation of the Capwell cases from 
their matched partners when one used 
the Pd scale as a discriminator. We have 
no explanation at present for this find- 
ing. 

The largest mean that we have ob- 
served was obtained from the graduate 
electrical engineers. These men were 
studied during the war and were mostly 
around 30 years of age. They were ex- 
empted from military duty in order to 
carry on aviation research and at the 
time of testing were applying for spe- 
cial airplane control testing at high alti- 
tude. The final group of miscellaneous 
employed was obtained from a sample 
of airline employees most of whom were 
college graduates or had several years 
of college work. These were in more 
skilled clerical or minor administrative 
type positions. 

We have described elsewhere [7] ex- 
periments in which ASTP men and sev- 
eral other groups were asked to fake 
good and bad profiles on the MMPI. In 
these experiments half the class faked 
a good or bad profile and the other half 
took the Inventory in a supposedly hon- 
est way. At a subsequent session of the 
class, the roles of these two groups were 
reversed. All of the subjects were naive 
in regard to personality inventories and 
in regard to the Multiphasic in particu- 
lar. 

This procedure afforded a check upon 
the action of F and L as well as K. In 
brief, it was found that F was very 
efficient in distinguishing faked bad rec- 
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ords but L was not at all effective in 
detecting a faked good record among the 
men and was only moderately effective 
with women. We at first presumed that 
the failure of L for men was in part 
due to the relatively obvious items of 
which L is composed. 

Among 48 student nurses asked to 
fake a good record, 16, or 33%, obtained 
a raw score L greater than or equal to 
7 (T score greater than or equal to 60) 
in contrast to only one out of 48 when 
the same girls took the test with a sup- 
posedly honest attitude. If a raw score 
L greater than or equal to 6 (T score 
greater than or equal to 56) is used, 
these figures become 54% identified as 
faked for the faked records, as con- 
trasted to 10% “false positive” among 
the honest records. This finding accords 
with our clinical experience that it is 
profitable to begin interpretation of L 
at T = 60 or even lower [7]. 

When we turn to the K distributions 
for these two groups, the most interest- 
ing findings are that the mean K score 
for the 48 nursing students taking the 
Inventory “honestly” is 18.3, standard 
deviation 3.80 and the corresponding 
statistics for 107 ASTP men are 19.8 
and standard deviation 4.10. These two 
means correspond to general normal T 
values of about 61 and 63 respectively. 
These means are definitely larger even 
than the means of college students in 
general as given in Table II. Some fac- 
tor seems to have operated on these two 
experimental groups to produce an un- 
usually high average value of K- when 
they were supposedly taking the test 
with an honest attitude. 

As might be expected, when data 
were obtained from the 54 of the ASTP 
men faking a bad profile, the mean K 
values shifted markedly downward. The 
statistics for this group of faked bad 
data are a mean of 8.1 and standard 
deviation 4.04. The mean corresponds 
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to a normal T value of 41. We have no 
corresponding statistics for women. K 
is in this case equally sensitive with F 
in differentiating the faked bad profiles. 
By contrast, the average K score for the 
53 ASTP men who attempted to fake a 
good score was very little different from 
their normal mean as given above. The 
mean of this group’s faked good records 
was 20.2 and the standard deviation 
3.66. The mean would correspond to a 
normal T of 64. This result was similar 
to the finding among the student nurses 
who obtained a mean K score of 19.7 
and a standard deviation 3.90 on faked 
good records. The normal T score for 
this mean is about 64. It should be kept 
in mind that when the ASTP men at- 
tempted to fake a bad profile the result- 
ing profiles were very severe, differed 
to a remarkable extent from the indi- 
vidual’s “honest” profile and could be 
recognized as invalid from the profile 
form alone. In contrast again to this, 
neither the men’s nor the women’s faked 
good profiles could readily be distin- 
guished in any consistent way from 
their honest profiles, nor from the ordi- 
nary profiles of normal persons in gen- 
eral. The obvious experiment in which 
one would take a group whe had deviant 
profiles and ask them to attempt to fake 
good was not performed. Further evi- 
dence on the behavior of K and F in the 
“fake bad” situation can be found in a 
recent article by Gough [2]. 

In consideration of these data, it 
seems justifiable to postulate that in 
these experiments the differentiation of 
the faked good profiles by the use of K 
is impossible because the “honest” was 
already in some sense faked good. The 
evidence that the “honest” represented 
something already related to faking can 
be derived from the fact that the “hon- 
est” means of both these groups were 
more than a standard deviation elevated 
in terms of the general normal mean 
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statistics for K and at least half a stand- 
ard deviation in T score above the 
means obtained from other college data. 
This elevation over the three other 
means given in Table II would be even 
greater in standard scores on the basis 
of the college data considered as norms. 
Some unidentified factor related to K 
must have operated in the experimental 
situation where the faking data were 
obtained. This latter assumption would 
be more certain if the rise in the means 
had not been observed from such differ- 
ent groups as student nurses and ASTP 
men. It is possible, however, to link 
these two groups provisionally in one 
significant element. Both the nurses and 
the men were under impulsion not to 
jeopardize in any possible way their 
continuance in the war-related pro- 
grams that they were following. This 
pressure would contrast to the situation 
of the miscellaneous college students 
who were tested either before or after 
the war and probably in even greater 
degree to the attitudes of the general 
MMPI norm groups that provided the 
normative statistics for the T table of 
K. 

The nearest approach to data on 
faked good scores as obtained from per- 
sons with initially deviant profiles is 
embodied in some incidental data ob- 
tained from our records where psychia- 
tric hospital patients repeated the In- 
ventory for one reason or another. By 
searching the duplicate records, we were 
able to find a few cases where patients 
had taken the Inventory twice and 
where the K raw score for the second 
test was four or more points higher 
than that for the first test. Most of 
these patients had originally deviant 
profiles. The obtained differences are 
not worthy of statistical analysis but all 
scales show a tendency to decrease in T 
score under these conditions. The most 
marked changes occurred on Hs, D, Pt 
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and Sc. Naturally, since these patients 
were not asked to fake a good score, the 
finding yields only presumptive evi- 
dence. 


CORRELATIONAL DATA 


The test-retest correlation of K is 
available on two groups. For a group 
of 85 high school girls (Capwell data) 
retested at an interval of 110 to 410 
days the correlation was .72. For a 
group of miscellaneous normals retest- 
ed after four days to one year the cor- 
relation was .74. It is of course impos- 
sible to say to what extent these coeffi- 
cients are to be viewed as indicators of 
“reliability.” 

A second question that may be raised 
regards the effect of the K-correction 
upon the intercorrelations of the other 
MMPI scales. Table V gives the corre- 
lation coefficients for the same group 
with and without the K-correction hav- 
ing been made before correlating. The 
intercorrelations for the original scores 
are indicated in ordinary type, while the 
corresponding coefficient upon the same 
sample after making the K-correction is 
indicated immediately to the right of the 
originals in bold-face. These coefficients 
are based upon a sample of 100 normal 
males, all college graduates, employed 
as engineers in an industrial concern 
(Honeywell cases of Table IV). We see 
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that some of the correlations are raised 
by the K-correction, that others are low- 
ered, and that this is true whether they 
are considered in the absolute or alge- 
braic sense. The increases preponderate 
over the decreases. Inspection suggests 
that the greatest shifts occur in the case 
of pairs of scales one of which suffers 
a considerable K-correction and the oth- 
er none (e.g. Hy and Pt). We are not 
prepared to give any special interpreta- 
tion of this table and include it here 
only for the sake of completeness. 


SUMMARY AND CONCLUSIONS 


Specific arguments and data are pre- 
sented establishing the rationale of us- 
ing the K factor as a suppressor on cer- 
tain MMPI clinical scales. Five scales 
seem to be improved by the correction, 
as indicated by increased correspond- 
ence between scores and clinical status. 
The scales Pt, Sc, Hs, Pd, and Ma re- 
ceive K-corrections of varying amounts. 
The scales Hy, D, Mf and Pa are not so 
treated nor is it established that the K- 
score should be taken into account sub- 
jectively in evaluating them. A new sta- 
tistic was used to determine the K cor- 
rection factors. This statistic, called the 
differential ratio, is described as appro- 
priate to establishing maximal differen- 
tiation between two distributions with 
emphasis upon the region of their over- 


TABLE V 
CORRELATIONS AMONG SCALES BEFORE AND AFTER K-CORRECTION. 





LATTER ARE IN BOLDFACE. N — 100 EMPLOYED 


MALE COLLEGE GRADUATES 








Hs D Hy Pd 
Hs 
D 33 33 
Hy 338 65 31 
Pd 28 37 36 37 25 47 
) i. ie sy 26 21 28 22 
Pa 08 22 17 37 15 24 


Pt 47 47 28 45 
Sc 51 59 20 26 
Ma 25 04 -07-05 -13 00 32 01 
K ~-25 08 53 09 


-17 38 33 43 





Mf Pa Pt Se Ma 
33 
41 44 16 45 
45 32 19 39 72 66 
20 34 06 12 50 26 53 21 
—08 22 —65 —46 —42 
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lap. 

Normative statistics on the distribu- 
tions of K for various groups are pre- 
sented. 

The chief finding of interest here is a 
tendency for college and college-educat- 
ed persons to deviate in the upward di- 
rection between one-half and one stand- 
ard deviation. It was suggested in the 
original article on K that this difference 
is chiefly a function of socio-economic 
status. 

Some evidence was presented to show 
that K behaves in the expected manner 
when persons attempt to fake a “bad” 
profile, although the corresponding ef- 
fect in faking “good” was not demon- 
strated on any experimental group. 
Some clinical support for this latter 
effect has been found. 

The addition of K had a variable 
effect on the intercorrelations of clini- 
cal scales. There seems to be some in- 
dication that the optimal amount of K- 
correction for a given clinical scale is 
inversely related to the proportion of 
“subtle” items the scale already con- 
tains. 

It is suggested that the K-correction 
should be made routinely by users of the 
MMPI and that old records should be 
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scored and redrawn if any research or 
validation study is to be carried on. 
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A PRELIMINARY APPRAISAL OF WECHSLER-BELLEVUE 
SCATTER PATTERNS IN SCHIZOPHRENIA’ 


By SOL L. GARFIELD? 


VETERANS ADMINISTRATION HOSPITAL 
MENDOTA, WISCONSIN® 


HIS STUDY represents an attempt 

to analyze the performance of 
schizophrenics on the Wechsler-Bellevue 
Adult Intelligence Scale in terms of 
possible group psychometric patterns. 
Such previous investigators as Gilliland, 
Wittman and Goldman [1], Magaret 
[2,3], Rabin [4,5], Rapaport [7] and 
Wechsler [8] have used different meth- 
ods of analysis with similar subjects 
and have reported somewhat divergent 
findings. Since various patterns have 
been offered for possible diagnostic 
purposes, such an evaluation was 
deemed to have both practical and 
theoretical significance. 


SUBJECTS 


The subjects used in this study con- 
sisted of 67 schizophrenic patients and 
46 nonschizophrenic and nonpsychotic 
patients who were all hospitalized in 
this institution. The latter group was 
utilized as a control group to test the sig- 
nificance of certain patterns, and con- 
sisted of 21 psychoneurotics, 10 psycho- 
pathic personalities, 11 chronic alco- 
holics and four cases of simple adult 
maladjustment. The subclassifications 
within the schizophrenic group were as 
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ciation, Chicago, Ill., May 3, 1947. 
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follows: catatonic, 7; hebephrenic, 13; 
paranoid, 4; simple, 6; and “unclassi- 
fied,” 27. All the subjects were male 
veterans. The average IQ of the 
schizophrenic group was 98 (S.D., 16.9) 
and that of the control group was 104.5 
(S.D., 14.8), indicating some compara- 
bility between the two groups in central 
tendency as well as variability of I.Q. 
The two groups were also similar in age 
with the mean age of the schizophrenic 
subjects being 33.09 years (S.D., 11.08) 
compared to the controls’ mean age of 
33.2 (S.D., 9.96). Most of the records 
were secured during the routine psy- 
chological examinations administered to 
patients upon their admittance to this 
hospital. 


PROCEDURE 


All eleven subtests were administered 
to each patient. However, the mean 
subtest score utilized in this study was 
obtained from the 10 subtests exclusive 
of the vocabulary. This was done to 
facilitate comparisons with other in- 
vestigators. The deviation of each in- 
dividual’s weighted subtests scores 
from his own mean was then computed, 
affording an analysis of intratest 
deviation for each subject. After pre- 
liminary analysis in tabulating these 
deviations, it was considered feasible to 
place them into five intervals. Any 
subtest deviation of less than 1.5 in 
either direction from the mean was con- 
sidered not significant and placed in a 
middle category. Deviations of 1.5 to 
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2.49 in either direction from the mean 
were considered as plus or minus one 
deviation. Any deviations of 2.5 or over 
in either direction were similarly con- 
sidered as plus or minus two deviations 
from the central tendency. The subtest 
deviations for both the schizophrenic 
and control groups were then analyzed 
in terms of these classifications, allow- 
ing for a direct comparison of the fre- 
quency of deviation between the two 
groups. It should be emphasized that 
this method of analysis is one which has 
the greatest value for clinical practice. 
Instead of averaging the scores on each 
subtest for all the subjects, or of aver- 
aging the deviations on each subtest for 
all the subjects, one actually tallies the 
frequency of specific deviations for each 
individual subject in designated clinical 
groups. This procedure follows the clini- 
cal approach suggested by Wechsler [8] 
and adheres closely to practical usage. 
Such a procedure, furthermore, is the 
actual practical test which any diag- 
nostic pattern must pass if it is to be 
successful. 


RESULTS 


The results secured from the analysis 
of subtest deviations are presented in 


Table I. Some general interpretations 
can be made from these data. On four 
subtests—Vocabulary, Digit Span, Pic- 
ture Arrangement and Picture Comple- 
tion—the performance of the two groups 
of subjects was essentially similar in 
terms of significant and nonsignificant 
deviations. On four other subtests the 
schizophrenics were slightly better than 
the controls: on Arithmetic and Infor- 
mation in terms of more positive devia- 
tions; on Block Design as indicated by 
fewer negative deviations and more 
positive deviations; and on Object As- 
sembly solely because of more frequent 
plus two deviations. The controls were 
only slightly better than the schizo- 
phrenics on the Similarities subtest (in 
terms of fewer negative and more posi- 
tive deviations) , but showed their great- 
est superiority on the Comprehension 
subtest. Thus the largest differences be- 
tween the two groups were observed on 
the Block Design subtest (in favor of 
the schizophrenics) and the Comprehen- 
sion subtest (in favor of the controls), 
but the overlapping was great. The con- 
trols had a few more positive deviations 
on the Digit Symbol subtest, but both 
controls and schizophrenics did poorest 
on this subtest (54 and 50 per cent neg- 


TABLE I 
PERCENTAGES OF SUBTEST DEVIATIONS ON THE WECHSLER-BELLEVUE SCALE 
FOR 67 SCHIZOPHRENIC AND 46 CONTROL SUBJECTS 
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ative deviations). This latter finding is 
similar to that reported by Gilliland, et 
al, [1]. In terms of absolute deviations 
both groups of subjects performed poor- 
ly on the Digit Span, Picture Arrange- 
ment and Digit Symbol subtests with 
over 40 per cent negative deviations on 
each subtest. Conversely, they both per- 
formed well on the Object Assembly and 
Comprehension subtests with similar 
percentages of positive deviations. In 
contrast to several other previous re- 
search findings, this group of schizo- 
phrenics did best on the Object Assem- 
bly subtest with 50 per cent securing 
positive deviations. Magaret [2], how- 
ever, has reported somewhat similar 
findings. 

These results appear to indicate a 
large amount of variability in subtest 
performance within the schizophrenic 
group as a whole and also to show 
considerable overlapping between the 
schizophrenic and the control groups. It 
would appear to be a difficult task to 
derive psychometric patterns of high va- 
lidity on the basis of these tentative 
data. For example, although no control 
subject secured a negative deviation on 
the Information subtest of —2, only 
four per cent of the schizophrenic sub- 
jects fell into this category. Although 
the schizophrenics appear to do very well 
on the Object Assembly subtest, only 
seven per cent more schizophrenics ac- 
tually secured positive deviations on this 
subtest than did the control group. In- 
terpretations of other subtest pattern- 
ings are similar. Tentatively it appears 
that the schizophrenic subjects display 
a great deal of intragroup variability 
and no clear cut group patterns are dis- 
cernable. 


OTHER FINDINGS 


An attempt was also made in this 
study to evaluate indices and patterns 
reported by previous investigators. For 


example, it has been reported that 
schizophrenics usually obtain a higher 
verbal score than performance score on 
the Wechsler-Bellevue Scale [4, 6, 8]. 
In the present group, only 51 per cent 
of the schizophrenic subjects actually 
secured a higher verbal than perform- 
ance score. In 42 per cent of the cases, 
the opposite was true, with seven per 
cent of the subjects having equal per- 
formance and verbal scores. In this con- 
nection the patterning of the controls 
was very similar to that secured from the 
schizophrenic subjects. An attempt was 
also made to evaluate Rabin’s index for 
schizophrenia in which the sum of the 
Information, Comprehension, and Block 
Design sub-tests was divided by the sum 
of the Digit Symbol, Object Assembly 
and Similarities sub-tests. Since Rabin 
[4] in his original publication did not 
list the precise mathematical value for 
his index, all indices above one were 
utilized in this comparison. On this ba- 
sis, 58 per cent of the schizophrenics 
would have secured a significant index, 
but also, 67 per cent of the controls 
would have been included in a similar 
manner. This index therefore does not 
appear to differentiate the present two 
groups of subjects—and, it should be add- 
ed that the indices secured from both 
groups were similar. Webb [9], secured 
very similar results. If a higher index 
had been utilized—i.e. 1.25, the mean 
value reported by Rabin for his schizo- 
phrenic subjects—82 per cent of the 
schizophrenic subjects would have been 
excluded, and thus the measure would 
have little practical value. Rapaport’s 
[7] contention that Block Design stands 
up better than Picture Completion in 
schizophrenia also was not substanti- 
ated by these findings. This was true in 
34 per cent of the schizophrenics and 28 
per cent of the control subjects, re- 
spectively. Two of Wechsler’s tentative 
criteria which could be checked object- 
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ively were also evaluated. One of these 
states that Object Assembly is usually 
lower than Block Design in schizo- 
phrenic’subjects. In this study only 37 
per cent of the schizophrenics revealed 
such a tendency as compared with 28 
per cent of the control subjects. On the 
other hand, Wechsler’s proposed meas- 
ure that the sum of Picture Arrange- 
ment and Completion subtests is less 
than Information and Block Design is 
substantiated in part. This pattern was 
revealed in 63 per cent of the schizo- 
phrenics as contrasted with 39 per cent 
of the controls. Of the measures evalu- 
ated, this appeared to have the most pos- 
sible utility. 

The extent of scatter, defined as the 
difference between the lowest and high- 
est subtest for each subject, was also 
compared for the two groups. No sub- 
ject had a scatter total of less than four. 
Twenty-five per cent of the schizophren- 
ics and 33 per cent of the controls had 
a scatter range from four to six; 59 per 
cent of the schizophrenics and 52 per 
cent of the controls had a scatter range 
from seven to nine; 16 per cent of the 
schizophrenics and 15 per cent of the 
controls had a scatter range of 10 to 15. 
Thus, there did not appear to be any 
significant differences in the amount of 
scatter between the schizophrenic and 
control subjects. Whereas the highest 
scatter difference of 15 was found on a 
schizophrenic record, a difference of 14 
and 13 were found only in three control 
subjects. These findings are in agree- 
ment with those of Gilliland, et. al. [1], 
and in part, with those of Magaret and 
Wright [3]. 


SUMMARY AND CONCLUSIONS 


On the basis of the data presented in 
this study, it would appear that as yet 
there are no reliable scatter patterns on 
the Wechsler-Bellevue Scale for schizo- 
phrenic subjects. Not only was there a 


great deal of variability evident among 
the schizophrenic subjects as a group, 
but there was a large amount of over- 
lapping between the schizophrenic and 
hospitalized control subjects. For prac- 
tical diagnostic purposes in an institu- 
tion, such a comparison is a more valid 
one than utilizing a group of normal 
subjects as a control group. It was also 
found that some of the measures report- 
ed by previous investigators were not 
completely substantiated by these find- 
ings. Several possible interpretations 
can be offered for the variation iff find- 
ings among the different investigators. 
i. There is undoubtedly a large 
amount of variability in the types of 
schizophrenic subjects used in such 
studies. For example, Rapaport [7] did 
not use any hebephrenic or catatonic 
schizophrenic subjects, and thus any 
generalization offered on the basis of 
his subjects will not hold true for the 
entire schizophrenic group. Also, the 
differences in diagnoses and diagnostic 
criteria used in different institutions is 
an important factor to consider in com- 
paring such results. 

2. Another important factor in ex- 
plaining differences is undoubtedly the 
degree of illness of the particular pa- 
tients used. Where long standing cases 
of schizophrenia are used, undoubtedly 
much more intellectual impairment and 
variability will be found. The present 
group, perhaps, represents a larger 
sample of younger or earlier cases than 
is usually recorded. Until attention is 
given to extent or degree of illness, com- 
parability of results will be hampered. 

3. The time at which the patient is 
tested is another factor contributing to 
variability. Since schizophrenics very 
often are variable in their behavior, 
the precise time at which they are test- 
ed is of some significance. 

4. The degree of motivation and 
rapport secured with patients is anoth- 
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er subjective factor which may influence 
test results. 

Such factors contribute to the varia- 
tions in findings from institution to in- 
stitution, and until attempts are made 
to observe strictly similar conditions, it 
is doubted that any generalized findings 
or patterns will be forthcoming that are 
applicable to any institution. When one 
evaluates the inconclusive patterns and 
critical findings of this study, as well 
as the significant discrepancies among 
the various studies already mentioned, 
it is apparent that at the present time 
there are no scientifically demonstrated 
scatter patterns. For this reason, cau- 
tion must be used in attempting to ap- 
ply such patterns for psychometric tests 
in various institutions. 
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ANNOTATED BIBLIOGRAPHY ON THE OSERETSKY’ 
TESTS OF MOTOR PROFICIENCY’ 


By RUDOLF LASSNER* 


THE TRAINING SCHOOL, VINELAND, NEW JERSEY 


LTHOUGH many forms of social 
achievement depend mainly on 
motor processes, a systematic investiga- 
tion of their genetic development has 
not been undertaken in this country. 
There have, of course, been many stud- 
ies dealing with particular aspects of 
motor development. Among them are 
the well-known motor studies with in- 
fants and young children by Bayley, 
Cunningham, Wellman, Psyche Cattell, 
Gesell, Goodenough, Stutsman, and 
many others. The four latter capitalized 
on their findings in their standard pre- 
school scales of general mental ability. 
Such work has been critically reviewed 
by Anderson [1] and Wellman [15]. It 
can also be located in the various bibli- 
ographies on child development which 
again have been listed exhaustively by 
Allen [2] and Goodenough [8]. A brief 
summarizing account in textbook form 
of our present knowledge of this field has 
been given by Goodencugh [9, pp. 227- 
254, 294-307]. The more general aspect 
of psychological consideration of motor 
phenomena is available through such 
references as are listed in the section on 
motor responses of Psychological Ab- 
stracts and the similar section of Psy- 
1 Since in personal communications the origi- 
nal author signs his name as shown here, this 
spelling has been utilized throughout this ar- 
ticle, though it is in contrast with other spell- 


ings of his name encountered in bibliographic 
references. 


2 Appreciation is extended to Dr. E. A. Doll 
who encouraged this study and offered valu- 
able suggestions. 


8’ Now with Guidance Center, Department of 
Corrections, San Quentin, Calif. 
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chological Index as well as through oth- 
er bibliographic sources. 

In numerous instances, here and 
abroad, the relationship of motor per- 
formance to mental ability beyond the 
preschool level was studied. An attempt 
to list such work is beyond the scope of 
this article. The interested reader is re- 
ferred to the sources mentioned above. 
Typical of this line of research are stud- 
ies which have emanated from the Vine- 
land Laboratory, such as those by Doll 
[5], Kreezer and Bradway [13], Glan- 
ville and Kreezer [7], and more recently 
by Heath [10]. These studies have been 
done with feebleminded subjects and do 
not include standardization data from a 
normal population, although such have 
been collected by Heath for his Rail- 
Walking Test.‘ 

Two broad investigations deserve 
closer attention. Brace [3], from the 
standpoint of physical education devel- 
oped a Scale of Motor Ability, consist- 
ing of “tests which measure natural 
rather than acquired motor ability, 
which involved a general functioning of 
the whole body in a variety of activities 
and which were economical of adminis- 
tration in point of time and equipment.” 
[3, p. 93]. He reported, however, that 
success on his Scale is little dependent 
upon age (r = .18 for school children of 
both sexes, 10-18 years; r = .22 for col- 
lege women 13 to 35 years of age) and 
hypothesized: “If motor ability is a na- 
tive trait, it probably develops with age 


* Personal communication by the author. 
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to some limit. Until this limit is reached 
a positive correlation with age should 
be expected” [3, p. 98]. Thus Brace 
found his tests (or “stunts’”) useful in 
classification for physical education, 
which subsequently confirmed by 
other investigators. This scale was not 
designed to measure motor aptitude de- 
velopmentally during the years of 
mentary education, although Dimock 
[4], giving it annually three years 
to 200 | within the age range of 
twelve to fifteen years, 


Was 
ele- 


OVS 
: ted an in- 
crease in test scores with both age and 
physiological maturity. 
Espenschade [6], after reviey 
numerous studies by physical educators 
on the relationship of motor 


eget 


ing the 


pertorm- 


. ‘ ‘ . . , and 
ance .O age, SeCX, ana 


physical growth, 


On 
maturity, summarizes them as follows: 
“Motor performance is related to age, 


weicht, and height during the elemen- 
tary and junior high school years but 


shows slight correlation with body build. 
Increase in performance of girls seems 
to cease at approximately fifteen years, 
s between seventeen and eighteen 
years. Sex differences are present at all 
times but become marked in adolescence. 
Physiological maturity evidently influ- 
increase in both sexes but 


of boy 


ences rate of 
the nature and extent of this influence 
has not been determined.” [6, pp. 9-10]. 
Her own investigation nsisted of 
seriatim measures of per- 
formance, obtained for approximately 
165 girls and boys of Adolescent 
Study group of the Institute of Child 
Welfare (University of California) 
over a period of four “While 
adequate to portray growth trends the 
measures are not sufficiently reliable for 
the determination of short term changes 
in rate of growth during the adolescent 
period.” [6, p. 118]. Motor perform- 
ances of boys were found positively and 
significantly related to all measures of 
maturity (chronological, emotional, 


gross motor 


the 


years. 


wey 


physiological), whereas correlations be- 
tween motor performances of girls and 
all measures of physical growth were 
low, and in most cases not statistically 
ficant. But’ even so, Espenschade 
believes in the clinical value of her mo- 
anticipating further explora- 
tion in the relationship of motor ability 
to social adjustment and to personality 
development from selected case studies. 
If, as she presumes, special ability and 
disability is significant only at the ex- 
tremes of the distribution, “this type of 
study .... could be used to determine 
the degree of variability within the ‘nor- 
mal’ range, and to suggest procedures 
for the early discovery and guidance of 
individuals who deviate significantly 
rom the normal” [6, p. 120]. In other 
rds, the great overlap of perform- 
‘Ss at successive age levels’ is realized 
Espenschade as an limita- 
in the clinical use of her motor 
;, while their value for the guidance 

of certain individuals is not denied. 

As may be seen from the foregoing 
spenschade, not- 
ts undoubted value of 
heir work, furnished us with 

r scale of motor maturation. 

In the absence of any device for the 
clinical evaluation of maturational mo- 
tol Perrermance, the recent publication 
of the Oseretsky Scale from the Portu- 
gruese version® has 1d considerable 
interest among psychologists and educa- 
tors in this country. In contrast to the 

picuous lack of a motor scale in our 
linical armamentarium, Europe the 
interest in such an instrument during 
the past two decades has been quite 
alive. It was stimulated by Homburger’s 

11, 12Joutline of age-motor develop- 


signi 


es . 
tor tests, 


inheren 


4 ™ a _ . 
he Brace noi EK 
the 


hav e 


arouss 


Norms reported include only the age range 
rom 12.75 to 16.75 years 

‘irst in four installments in The Training 

01 Bulletin and more re ently as a collect- 


r 


d reprint by the Educational Bureau 


Tes 
see bibliographic references: 42, 13). 
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ment, which revealed the mechanisms 
of separate movements, but whose dif- 
mi | motor diagnosis was still based 
on observation only. From this inter- 
est at least three published motor scales 
have emerged. 

One, confined to manual ability, was 
constructed by the Dutch psychologist 
Van Der Lugt and published in French 
in 1929[14]. Although it has been fav- 
orably reviewed, has been in clinical use 
in Holland, Belgium, France and in the 
U.S.A., and is supported by >a 
standardization data, it has not received 


much re erage ition in this country, a. 
ably be cause no English translation of 
Va ‘ Lugt’s book has yet appeared. 


heated scale was published 7 a 
Russian scientist, Yarmolenko [16], 
method having been suggested by “4 
Dernowa-Yarmolenko. This was part of 
work carried on at the Laboratory 
of Age Reflexology of the Bekhterev In- 
stitute for Brain Research. The scale 
investigates “life-essential movements” 
(walking, grasping, etc.), requires little 
equipment, was standardized on school 
children between 8 and 15 years, and 
yields motor profiles, which in individ- 
uals are rated normal if all points fall 
between 1 «. Yarmolenko recom- 
mended her scale, beyond the determi- 
nation of the level of the child’s motor 
development, for the diagnostic value of 
its profiles and for the possible measure- 
ment of improvement as the result of 
pedagogical work. Her work with it on 
group deviations of different types of 
defective children, and on characteristic 
motor profiles for the psychoneurotic, 
the blind, the deaf and dumb, etc. has 
not been made available in the English 
ice Moreover, this, her only Eng- 
lish publication, does not include data 
on administration and scoring specific 
enough to make it usable to us. 

The third scale( but the first in the 
order of appearance) was that by Ose- 


retsky, which was published first in 
Russian in 1923 from the Psychoneu- 
rological Children’s Clinic in Moscow 
(Dr. M. O. Gurewitch, director), and 
has been critically evaluated and sucess- 
fully employed at various continental 
E uropes centers. Unlike Van Der 
LA series, but similar to Yarmo- 
ler , these tests require participation 
of various bodily areas in the perform- 
ances. Although the present biblio- 
graphy is confined to this scale, the in- 


— = 


terested student of maturational motor 


functions may like to take cognizance of 
4}, . _— 1? 
the other two as well 


Through perusal of the Psychological 
Index, Psuchological Abstracts, Child 
Development Abstra and of several 

ublications on the Oseretsky Scale in 
foreign languages, fifteen Russian ref- 
erences (published between 1923 and 
1934) were located. Only two of these 
have been abstracted before in the Eng- 


lish lanruare and could be included 
: | *4t 


here with annotations. The other thir- 


teen are only cited with more or less 
exact publication data, according to the 
way in which they were encountered. 
Effort to obtain the orig nals through 
the American-Soviet Medical Library 
has been unsuccessful. 


In spite of the wide use of the Oseret- 
sky Scale in nine continental European 
countries none of the publications con- 
tain normative standardization data 
with experimental evidence as are usual 
for test development in this country 
(reports on adequate sampling, central 
tendencies of scores, standard devia- 
tions, measures of validity in regard to 
a suitable criterion, reliability, etc.). 
Trial experience with the Scale suggests 
that the year locations of some of the 
tests require modification, at least as far 
as an American population is concerned. 

In preparing the bibliography these 
guiding principles were employed: 

Foreign titles of studies seen in 
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original are followed by their English 
translation. 

2. Titles of studies seen in abstracted 
form only, are cited as they have been 
found, except for certain unifications of 
authors’ names. 

3. Also for reasons of uniformity all 
Russian titles are given in English 
translation, irrespective of the language 
in which they have been encountered. 

4. Sources for items not seen in origi- 
nal are duly acknowledged. An italic 
number in parenthesis following a quo- 
tation refers to the particular bibliogra- 
phic reference as the source. 

Bearing in mind that the cooperative 
efforts of many students will be neces- 
sary to adapt and standardize the Ose- 
retsky tests for an American popula- 
tion, this bibliography is offered as a 
contribution to an important area of 
psychological research. 


1. OSERETSKY, N. I. (Russian) A metric 
scale for studying the motor capacity of 
children. 1923. Pp. 24. (24) 

2. OSERETSKY, N. I. (Russian) The feeling of 
space, the method of its examination and 
evaluation. In Organization of Work, 192A. 
(7) 

3. GUREWITCH, M. O. (Russian) Concerning 
the problem of methods and aims of re- 
search in motor function. Problems of 
Pedology and Child Psychoneurology, 1924. 
(7) 

4. MERKIN, REGINA. Tests d’Oseretsky; pour 
le développement des fonctions motrices de 
l’enfant. (Oseretsky’s tests of the develop- 
ment of motor functions of the child.) 
Arch. Psychol., Geneve, 1925, 19, 75, 224- 
259. 


This first publication in the French language 
(Switzerland) is preceded by an introduction 
by Claparede. The purpose of the publication 
was to investigate whether the gradation of 
tests corresponded to the development of occi- 
dental children. Except for one case or two, 
this was found to be true. The experimental 
population consisted of 76 children (41 boys, 
35 girls) to whom 77 of the tests were given. 
Those which were repeated at various age lev- 
els were eliminated leaving only 63 different 
tasks. The juggling test at level IX was 


found much too difficult for Swiss children and 
even for Swiss adults. Oseretsky’s scoring 
method by which the chronological age is taken 
into consideration from ten years on, was dis- 
approved by Merkin. Most tests were passed 
by 75% of the children in the age group to 
which they were allocated; yet, in general, the 
tests at levels VII and VIII were too easy and 
those from X to XV, too difficult. The least 
age differentiation could be obtained between 
VII and VIII years. Sex and socio-economic 
differences were found to exist. The applica- 
tion of the scale to abnormal children was rec- 
ommended. A final suggestion of Merkin was 
to divide the scale into two parts, one with 
tests of age difference and one with tests of 
motor aptitude. 


5. OSERETsSKy, N. I. (Russian) The method 
of special motor tests. Studies of the Cen- 
tral Work Institute, 1925, No. 3 (7) 


6. OsERETSKY, N. I. Eine metrische Stufen- 
leiter zur Untersuchung der motorischen 
Begabung bei Kindern. (A metric scale 
for studying motor aptitude of children.) 
Z. Kinderforsch., 1925, 30, 300-314. 


This is the first publication of the Scale in 
German language. The author set forth the 
observations that normal children and even 
those above the average in the development of 
intelligence often show striking motor defici- 
ency of light and severe types to the point of 
“motor idiocy”. In order to correct certain de- 
ficiencies of the Scale as published in 1923 (in 
Russian) when the standardization group had 
consisted of 410 children only (195 boys, 215 
girls, with the exclusion of underdeveloped 
children and those with somatic or neurologic 
defects), a control group of 1500 normal 
school children and 200 psychotic, nervous and 
psychopathic children was included in this fur- 
ther study. Tests at age levels IV-XV are de- 
scribed with omission of Years X, XII, XIV. 
In the absence of tests for these levels a 
method is suggested to double credits when 
children pass any of the XI, XIII, or XV year 
tests above their life age. The latter is always 
stated as obtaining at the nearest birthday. 
The calculation of motor age is done by the 
Binet method; illustrative examples are given. 
Grades of motor deficiency are distinguished: 
“light”: 1-14 years below life age; “medium”: 
14-4 years; “great”: 3-5 years; “idiocy”: more 
than 5 years below life age. As a limitation 
of the Scale, Oseretsky admits that the tests 
are not completely equivalent in regard to 
their diagnostic significance and in part also 
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depend on skills accidentally present. He be- 
lieves that their significance is greater in the 
higher levels. 


7. Oseretsky, N. I. UND GUREWITCH, M. O. 
Zur Methodik der Untersuchung der mo- 
torischen Funktionen. (Concerning meth- 
ods of examining the motor functions.) 
Mschr. Psychiat, Neurol., 1925, 59, 78-103. 


This point scale, intended for the examina- 
tion of adults in psychotechnical situations (vo- 
cational guidance), or of pupils in determining 
certain educational assignments, or of clinic 
cases (e. g. ataxia), includes, besides many 
tests of the year scale, others, thus increasing 
the number of components from 6 to 12. Some 
of the tasks measuring 4 components of the 
year-scale (motor speed, simultaneous, volun- 
tary movements, coordination and synkinesia) 
are different in this point scale, according to 
its different purposes. The 8 components added 
are: 1) formation of motor formulas; 2) auto- 
matized motor action; 3) rhythmic ability; 4) 
speed of mental set; 5) motor strength; 6) 
orientation in space; 7) regulation of innerva- 
tion and denervation; 8) automatic defense re- 
actions. In contrast to the year scale, the 
method described in this article involves more 
elaborate apparatus. Through its application, 
the authors discovered patterns (retardation 
in certain functions, acceleration in others) 
which are characteristic of the schizoid and 
cycloid personality types in children as well 
as in adults. In general, cycloid children were 
found to be more accelerated in motor develop- 
ment than schizoids. 


8. Borovikov, I. U. (Russian) An investiga- 
tion of the motor endowment of children 
with a speech defect (logopaths) and of 
deaf-mutes (acupaths). Voprosy izucheni- 
ya i vospitanyia lichnosty, 1926, No. 1-2, 
175-179. 


This investigation of the motor endowment 
of logopaths and acupaths with the Oseretsky 
Scale shows that in them this capacity is less 
developed than in normal children of the same 
age and that this discrepancy becomes more 
pronounced with increasing years. (Child De- 
velpm. Abstr.) 


9. GUREWITCH, M. O. Motorik, Kérperbau und 
Charakter. (Motor functions, physique 
and character.) Arch. Psychiat., 1926, 76, 
521-532. 


This is a further attempt to correlate varia- 
tions of motor functions (as measured by Ose- 


retsky’s point scale described in the previous 
study) with Kretschmer’s somato-psychologi- 
cal types. The results are not conclusive. 


10. OSERETSKY, N. I. (Russian) A scale of 
motor capacities. Problems of Pedology 
and Child Psychoneurology, 1926, 2, 334- 
337. (Psychol. Index) 


11. KEeMAL, C. Contribution a |’étude des tests 
de développement moteur d’Oseretsky. 
(Contribution to the study of Oseretsky’s 
tests of motor development.) Arch. Psy- 
chol., Geneve, 1928, 21, 81, 93-99. 


These experiments were supplementary to 
those published by Merkin in 1925. They were 
made on 110 normal children of both sexes be- 
tween the ages of 4 and 14 years and on 20 
abnormal children. Five points were stressed: 
(1) Although the results corresponded mainly 
to Merkin’s, certain tests were found too diffi- 
cult, others too easy for their age level. This 
may be improved by setting different norms 
for the two sexes. (2) Motor performances of 
boys and girls were alike up to age 8 only; 
after that two different scales become indis- 
pensable. (3) There existed no correlation be- 
tween motor and mental development as meas- 
ured by Terman tests. (4) Regarding abnor- 
mal children, difficulties were encountered in 
setting the problem of the relation between 
motor and mental retardation. True motor re- 
tardation had to be distinguished from the 
physical impossibility of executing a task as 
a consequence of a pathological state of the 
muscular system, or of bodily deformation. 
Mental retardation may also hinder the child 
from understanding what he is to do. A cor- 
relation of .70 between the motor and the in- 
telligence quotients was found. (5) The small 
number of cases did not permit an answer to 
the question whether these are tests of age or 
of aptitude. 


12. OSERETsKy, N. I. (Russian) A method of 
group rating of motor abilities in child- 
hood and youth. Moscow: Gosmedizdat, 
1929. Pp. 60. (Psychol. Index) 

13. OSERETSKY, N. I. (Russian) An investi- 
gation of motor ability (Tests). Irkutsk: 
Viast Truda, 1929. Pp. 8. (Psychol. In- 
dex) 


14. OSERETSKy, N. I. Zur Methodik der Un- 
tersuchung der motorischen Komponen- 
ten. (Concerning the methods of analyz- 
ing the motor components). Z. angew. 
Psychol., 1929, 32, 257-298. 








42 JOURNAL OF CONSULTING PSYCHOLOGY 


An experimental study, using the point scale 
described in a previous article (with Gure- 
witch) to appraise the developmental stage of 
various motor systems. A check list for quali- 
tative description of the subject’s pecularities 
during the performance is given. The 7 com- 
ponents investigated were: static and dynamic 
coordination; motor speed; simultaneous, vol- 
untary movements; synkinesia (all these also 
contained in the metric scale); plus: rhyth- 
mic ability, and motor strength (energy) 
(which are not suitable for year-by-year grad- 
ing). The assumption was that each motor 
component depends on the intact functions of 
a certain system. The experimental group con- 
sisted of 1013 mentally and physically normal 
children and 216 oligrophrenic and psycho- 
pathic children. The method of serial testing 
is recommended, starting in each component 
from the child’s age level upwards and down- 
wards to establish two successive failures and 
successes, respectively. A “motor profile” giv- 
ing a graphic representation of the plus and 
minus deviation of the various components 
from the mean performance, was found useful. 
Although the data were not segregated by sex 
the impression of the investigator was that 
girls are speedier and more rhythmic, boys 
stronger and better in voluntary, simultane- 
ous movements. Some differences also existed 
according to social, economic and ecological 
background. The diagnostic value of such a 
scale is emphasized along with the usual in- 
telligence tests in vocational guidance; it may 
serve as a criterion of success in motor train- 
ing and as an objective measure of the capac- 
ity of the children and adolescents, thus pro- 
tecing them from industrial exploitation. Ose- 
retsky realizing the limitations of his scale 
which includes seven components only, con- 
cluded this study by expressing the hope of 
having in the future tasks for other motor 
components so that this method may replace 
the year-by-year scale. 


15. OSERETSKY, N. I. Methodik der kollektiven 
Priifung der Motorik bei Kindern und 
Minderjahrigen. (A group method of ex- 
amining the motor functions of children 
and adolescents.) Z. Kinderforsch., 1929, 
35, 332-372. 


Because individual testing is too time-con- 
suming in certain situations, a group method 
was devised and applied to 1200 normal and 
abnormal (mostly oligophrenic) children. Only 


six components of the original scale were used, 


with the number of tests shortened, namely 


five for each component; in sum 30 tests. By 
this method 20 to 25 persons could be examined 
simultaneously, whereas for younger or prob- 
lem children a reduction to 12 to 15 seemed 
more desirable. A convenient standard order 
of the administration is described whereby the 
tasks requiring sitting at tables are performed 
first, followed by those requiring an empty 
room. The scoring system is by points, with 
age norms corresponding to sum of points. 
Standards, however, were established for two- 
year groups only. Norms for interpreting ad- 
vancements and retardations are given as well. 
The group examination takes 40-45 minutes 
for a group of normal children, over an hour 
for retarded children. In cases of limited time 
certain abbreviations of the scale according to 
the age group are suggested. 


16. GuREWwITCH, M. O. (Russian) Psychomo- 
tor phenomena. Part I. Moscow: Gosme- 
dizdat, 1930. (Psychol. Index) 

17. Oseretsky, N. I. (Russian) Psychomotor 
phenomena, Part II. Moscow: Gosmediz- 
dat, 1930. (Psychol. Index) 

18. GUREWITCH, M. O. (Russian) The develop- 
ment of human psychomotor functions. 
Mater. I. vsesoju. sez. isuch. poved. che- 
lov., 1980, 184. (Psychol. Index) 

19. OSERETSKY, N. I. UND GUREWITCH, M. O. 
Die konstitutionellen Variationen der Psy- 
chomotorik und ihre Beziehungen zum 
Kérperbau und zum Charakter. (Consti- 
tutional variations in psychomotor ability 
and their relationships to somatic consti- 
tution and character.) Arch. Psychiat. 
Nervenkr., 1930, 9, 286-312. 


In order to gather material on the relation- 
ships existing between somatic constitutions 
and character the authors examined the motor 
ability of 4858 subjects with Oseretsky’s point 
scale (11 components) and differentiated them 
according to Kretschmer’s somatic types. Re- 
lationships between character, motor ability, 
and somatic constitutions were established. 


20 GUREWITCH, M. O. (Russian). On the 
structure of human motor functions, and 
their development with increasing age. 
Pedol., 1981, No. 2, 30-34. (Psychol. In- 
dex) 

21. OSERETsKY, N. I. Psychomotorik: Meth- 
oden zur Untersuchung der Motorik. 
(Psychomotor ability; methods of study- 
ing motor functions.) Beth. Z. angew. 
Psychol., 1981, 17, H.57, Pp. 162. 


In Part II of this monograph the revised 
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scale for the measurement of motor ability, 
designed for individual testing, is described. 
It is applicable to children from ages 4 to 16. 
Crude norms are indicated, and a method of 
group testing is given. Parts I and III of the 
monograph deal with systematic observations 
of various general motor characteristics, such 
as body posture, etc., with other tests of spe- 
cific aspects of movement, as rhythm, tonus, 
etc., and with methods of recording move- 
ments. 


22. OsSERETSKY, N. I. (Russian) The problem 
children. Moscow: Uchpedgiz, 1932. Pp. 
224. (Psychol. Index) 


23. YARMOLENKO, AUGUSTA. The motor sphere 
of school age children. J. genet. Psychol., 
19383, 42, 298-316. 


In her introductory statements, before pre- 
senting her own method of motor testing, Yar- 
molenko pointed to the (approximately) simul- 
taneous appearance of Oseretsky’s year scale 
of motor tests and of Brace’s scale of motor 
development. She found Oseretsky’s tests in- 
adequate for the research in which her labo- 
ratory was engaged at that time, because they 
do not permit of arriving at a differential mo- 
tor diagnosis. “It is not enough to say that 
a child’s motor coefficient is normal for his 
chronological age, or that he surpasses it or 
does not reach it. The data must be analyz- 
able” (p. 300). 


24. DeEcROLY, J. ET BRATu, A. E. La mesure 
de la motricité chez l’enfant et 1l’adoles- 
cent; Echelle d’Oseretsky. (The measure- 
ment of motor functions in children and 
adolescents; the Motor Scale by Oserets- 
ky.) Rev. Pedag., 1934, 4. Pp. 20. 


This is a French translation (published in 
Belgium) of the revised (1931) scale. It in- 
cludes the general scoring scheme and the 
methods of establishing the “motor age” (like 
the German translation) ; also the differential 
weighting of successes in the high age groups 
according to life age, since tests at levels XI, 
XIII, and XV do not exist. Practical experi- 
ences with the scale were gained at the Men- 
tal Hygiene Clinic at Brussels. In spite of 
some imperfections six pedagogical advantages 
of the scale are enumerated as follows: (1) 
Adapting the pupil’s work to his motor capac- 
ity, (2) devising remedial exercises, (3) in- 
troducing more rational teaching methods for 
a harmonious development of the motor func- 
tions, (4) checking by repeated examinations 
the value of pedagogical processes, (5) exami- 


nation of certain components may indicate the 
degree of fatigability of a-certain pupil, (6) 
greatest value for children in need of special 
instructions. 


25. AISENBERG, B. AND MILLER, L. S. (Rus- 
sian) The psychoneurotic child in school 
and school-shop work. Psychoneurotic 
Children, Leningrad: 1934, 113-120. (Psy- 
chol, Index) 

26. OSERETSKY, N. I. AND PAyovA, E. (Rus- 
sian) Concerning the study of motor ca- 
pacity of children with illnesses of the 
metamere system and defective peripher- 
al apparatus. Sovetsh.. Nevropatol, 1934, 
No. 7, 113-118. 


The Scale was used to investigate the motor 
capacity of children with diseases and defects 
(30 with poliomyelitis, 51 suffering from tu- 
bercular spastic paralysis of joints, amputated 
extremities and the Kashin-Beck disease). 
Static coordination and simultaneous move~ 
ments were conserved in all cases. The dy- 
namic coordination of the arms was better in 
cases of poliomyelitis than in all the other in- 
vestigated patients. Speed and tempo of move- 
ment were decreased in poliomyelitis. A num- 
ber of synkinesias were observed. The investi- 
gation of the mimicry and handwriting of chil- 
dren with poliomyelitis showed the defective 
regulation of innervation and denervation. The 
motor inferiority of poliomyelitis could be lo- 
calized in both pyramidal and extrapyramidal 
systems. (Psychol. Abstr.) 


27. OSERETSKY, N. I. AND PAyovA, E. (Title 
not given; apparently a treatise of the 
same subject as the foregoing, probably 
in French.) Les Annales de UEnfance, 
No. 75, 1934. (28) 


28. OSERETSKY, N. I. AND PAyova E. Die Psy- 
chomotorik poliomyelitischer Kinder; zur 
Lokalisation der Poliomyelitis. (Psycho- 
motor ability of children with poliomye- 
litis; concerning the localization of polio- 
myelitis). Z. Kinderforsch., 1935, 44, 253- 
269. 


The 1931 revision of the Scale was used with 
50 cases of poliomyelitis, 22 boys and 28 girls. 
Neither feebleminded children nor those with 
personality deviations (except for certain re- 
actions of inferiority feelings due to the physi- 
cal defect) were included. Those tests which 
were not serviceable for 90% of these children 
(static and general dynamic coordination) 
were replaced by observation in daily life. 
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Tests of dynamic coordination of hands were 
found suitable for the majority of the chil- 
dren. Results from tests and information 
showed that static and general dynamic coor- 
dination were unimpaired, but impairment of 
dynamic coordination of hands, considerable 
disturbance in speed of movement, and a great 
number of synkinesias could be demonstrated. 
Certain conclusions regarding the pathology 
of the disease were drawn from the results of 
the study. 


29. Kopp, HELENE. Les troubles de la parole 
dans leurs rapports avec les troubles de 
la motricité. (Speech disorders in rela- 
tion to motor disorders.) Evolut. psy- 
chiat., 1985, 2, 77-102. 


The author used the Oseretsky Scale for de- 
termining the motor ability of stuttering and 
lisping children In general, the stutterers 
seemed to be better endowed than the lispers 
from the motor point of view. Stutterers from 
8 to 13 years of age had a motor rating which 
was near the normal; on the whole, it was 
found that the stutterers gave negative re- 
sults particularly for the mimicry tests and 
for associated movements, a fact that indicat- 
ed a deficiency in the function of the extra- 
pyramidal system. Whether dealing with stut- 
terers or lispers, the author found that the 
motor deficiency as measured by the Scale is 
part of the condition which brings about these 
two large groups of speech disorders. 


30 ABRAMSON, JADWIGA ET Kopp, HELENE. 
L’échelle métrique du développement de 
la motricité chez l’enfant et chez l’adoles- 
cent par N. Oseretsky; traduite et adap- 
teé. (The metric scale by N. Oseretsky 
of the development of motor functions in 
the child and the adolescent; translated 
and adapted.) L’Hygiéne Mentale, 1936, 
31, 3, 53-75. 


This is a new translation of the Russian text 
of 1981 which was found necessary because of 
variations between the previous translations, 
in particular the German of 1931 and the Bel- 
gian of 1984. Since 1934 the translators had 
been using the scale at the Neuropsychiatric 
Child Clinic of the Medical Faculty in Paris. 
They emphasize the value of a profile of motor 
components which can be obtained through the 
Scale, in addition to the total development of 
the motor level. 


31. SPApDAveccHIA, S. Contributo alla conos- 


cenza della costituzione motoria. (Con- 
tribution to the knowledge of motor con- 
stitution). Rass. Studi. psichiat., 1936, 25, 
384. 


The Oseretsky tests were used with healthy 
and unhealthy children. Many regional differ- 
ences among the Italian children were noted. 
It was found that girls surpassed boys in re- 
spect to motor ability in the earliest years. In 
mentally abnormal children no close connec- 
tion between mental and motor development 
could be established. The author proposed the 
use of the tests for the investigation of the 
motor constitution of mental defectives ‘and 
for a classification of common motor insuffici- 
encies. (Psychol. Abstr.) 


32. ABRAMSON, J. ET LE GARREC, S. Notes 
sur quelques correlations psychomotrices 
chez les ecoliers normaux. (Notes on cer- 
tain psycho-motor correlations in normal 
school children.) L’Hygiéne Mentale, 
1937, 32, 1-8. 


The Scale was found reliable for determin- 
ing motor rating, particularly for analyzing 
the essential motor elements of static and dy- 
namic coordination and the presence or ab- 
sence of synkinesia. Correlations between the 
intelligence quotient and the motor quotient 
were .31 for girls and .30 for boys. The tests 
were found to be better adapted to boys than 
to girls. (Psychol. Abstr.) 


33. Kopp, HELENE. Le bézaiement. (Stutter- 
ing.) Evolut. psychiat., 1987, 3, 3-21. 


Stuttering is a psychomotor disorder being 
both a neurosis and a motor disturbance. Bas- 
ing her work on this principle, Kopp studied 
muscle tonus and the neurology of movement 
by means of motor tests, using those by Kuhl- 
mann, Gesell, Itard and Decroly through the 
age of four, and the Oseretsky tests after that 
age. These tests being based on the semeiology 
of the nervous system furnished a relatively 
complete picture of the degree of motor effici- 
ency present. Besides proving this point, the 
article includes case studies and the author’s 
conclusions regarding re-education of stutter- 
ers through the development of spoken and 
song rhythm. 


34. LeEUW-AALBERS, A. J. De. Enkele critische 
beschouwingen over de metrische skala 
van Oseretsky. (Some critical remarks 
concerning the metric scale of Oseretsky.) 
Ned. Tijdschr. Psychol., 1938, 6, 215-230. 
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The author found it necessary to modify the 
scale in order to make it valid for children in 
the Netherlands. (Psychol, Abstr.) 


35. VAN Der LucT, MARIE J. A. Un profil 
psycho-moteur; d’aprés une étude moto- 
métrique de l’habileté manuelle. (A psy- 
chomotor profile; from a metric study of 
manual ability.) Paris: Aubier (Editions 
Montaigne), 1939. 


This monograph deals with a different series 
of tests, devised by Van der Lugt in Holland. 
In her introduction the author offers an ex- 
haustive survey of the field of motor testing, 
up to the time of her study. In discussing the 
Oseretsky tests her criticisms are the follow- 
ing: (1) The tests have not been selected dis- 
criminatively, (2) the diagnostic significance 
varies from test to test, (3) insufficient allow- 
ance has been made for sex differences, (4) 
practice opportunity, as afforded in certain en- 
vironments, may influence a given subject’s 
performance, (5) the technique of administra- 
tion is too complicated, the number of tests is 
too large, and the instructions are not precise 
enough. 


36. JuARROS, C. Valor practico de la pruebas 
colectivas de Oseretsky para la determi- 
nacién de la edad motora. (Practical val- 
ue of Oseretsky’s group tests for deter- 
mining motor age.) Psicotecnia, 1939, 1, 
40-60. 


It was found that the individual method of 
giving the Oseretsky tests for determining mo- 
tor age or physical development required too 
much time. The writer revised the battery of 
tests so that they could be given as group tests. 
Six groups of motor tests were used: static 
coordination, dynamic coordination, speed of 
movement, simultaneous movements, force, and 
precision of movement. Tests included ex- 
amination for separate and coordinate hand 
and leg movements. The writer enumerates 
the practical value of such tests in the physi- 
cal education of normal children and in the 
determination of physical characteristics of 
psychotic and neurotic children. (Psychol. 
Abstr.) 


37. ESPENSCHADE, ANNA. Motor performance 
in adolescence; including the study of re- 
lationships with measures of physical 
growth and maturity. Monogr. Soc. Res. 
Child Develpm., 1940, 5, No. 1. 


Reviewing the literature in her introduction, 


Espenschade briefly discusses among others 
Oseretsky’s tests (revised scale) and quotes 
(in German) Oseretsky’s definition of the Ger- 
man term (Motorik). She considers his test 
series insufficient for a complete measurement 
of Motorik. Tests of rhythm and tempo, of 
strength of movement and certain other aspects 
of behavior should be included. “Since no tests 
are available for some of these, and others re- 
quire very complex apparatus, they cannot be 
included in the metric scale. An incomplete 
analysis of data is presented and the exact 
basis for selection and placement of tests is 
not stated” (p. 6). 


38. JuARROs, C. El nivel motérico; edad mo- 
tora. (Stages of motor ability; the motor 
age.) Madrid: J. Morata, 1941.’ 


This is an extensive description of the Scale 
making it available in the Spanish language, 
and of its application in the School for the 
Abnormal in Madrid. Further discussions in- 
clude basic factors of motor ability, training 
habits, neurotic disturbances, and the prob- 
lem of children of subnormal development and 
growth. (39, Psychol. Abstr.) 


39. Da Costa, MariA I, L. Testes de Oseret- 
sky: método, valor e resultados. Sua 
adaptacao em lingua porteguesa. (The 
Oseretsky tests; method, value and re- 
sults. Portuguese adaptation.) A Crianca 
Portuguesa, 1943, 2, 193-228. 


This Portuguese adaptation of the year scale 
facilitates its comprehension by diagrams and 
also features a score sheet as used in the In- 
stituto de Anténio Aurélio da Costa Ferreira. 


: 

40. Kopp, HELENE. The relationship of stut- 

tering to motor disturbance. Nerv. Child, 
1943, 2, 107-116. 


This is, as far as we were able to find, the 
first partial publication of the Scale in the 
English language. The author reports on a 
study of the motor development of 450 stutter- 
ers carried out at the Annex Clinic of Child 
Neuropsychiatry of the University of Paris 
(see her previous French publication). The 
Oseretsky tests are briefly described and the 
tests for IV and V years are shown as exam- 
ples. In this study the complex picture of the 
functioning of the various motor systems was 
of greater interest than the actual motor level. 
Kopp, therefore, altered the procedure starting 
with the tests for IV years, whatever the chrono- 


7 One source indicates 1942. 
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logical age of the subject, and ascending the 
scale until negative results were obtained for 
all tests at the particular age level. Sample 
records of motor examinations are given and 
the motor age, besides the mental age, is in- 
cluded in the detailed records on 28 children. 
The main purpose of this study was to place 
special emphasis on correlation between stut- 
tering and motor disturbances. The conclu- 
sion was reached that stuttering is not a psy- 
chological but a neurological disorder. In spite 
of certain shortcomings the author found the 
Oseretsky tests very helpful for such investi- 
gations. 


41. Kopp, HELENE. II. Oseretsky Tests. 
Amer. J. Orthopsychiat., 1946, 16, 114- 
119. 


This is one of four contributions to a psy- 
chosomatic study of 50 stuttering children car- 
ried on by the Board of Education in New 
York City. Dr. Kopp, as in her studies in 
Paris, found, through using the Oseretsky 
Scale, marked disturbances in the motor func- 
tion. Even in cases not exhibiting motor re- 
tardation, analysis of their scores revealed uni- 
form deficiency in the maturity of the extra- 
pyramidal system, shown by failure in tests 
for synkinetic movements, mimicry, rhythm, 
and coordination. Using Oseretsky’s four 
broad categories of motor retardation Kopp 
found that 46% of the stuttering children test- 
ed as motor “idiots”, 26% showed “severe re- 
tardation”, 20%, “motor deficiency”, and only 
6% showed “normal” motor development, one 
child testing “superior”. Greatest deficiency 
was exhibited in tests for synkinetic move- 
ment and static coordination. Best results 
were obtained on tests of general dynamic co- 
ordination which include large propulsive 
movements. 


42. Dou, E. A. The Oseretsky Scale. Amer. 
J. ment. Def., 1946, 50, 485-486. 


In this “current field note” Doll states the 
traditional interest of the Vineland Labora- 
tory (since Goddard) in the measurement of 
motor aptitudes of the feebleminded, calls at- 
tention to the conspicuous motor awkwardness 
of the exogenous defective, and announces the 
forthcoming publication of a translation of the 
Scale from the Portuguese version. He stresses 
the importance of an American adaptation and 
standardization for a better understanding of 
the limitations of mentally deficient children 
and adults. 


43. Da Costa, Marie I. L. (Translated by E. 
J. Fosa.) The Oseretsky tests; method, 
value and results. (Portuguese adapta- 
tion. Train. Sch. Bull., 1946, 43, 1-18, 27- 
38, 50-58, 62-74. 


The first English translation of the complete 
scale was sponsored and edited for technical 
content and interpretative comment by E. A. 
Doll. In his preface the editor comments on 
the need for the clinical evaluation of develop- 
mental motor performances, particularly in the 
field of mental deficiency, and on the apparent 
relation of motor defect to certain mental pat- 
terns. He also anticipates the necessity of 
modification in content and procedure for an 
American standardization of the Scale, in par- 
ticular that of freeing some of the tests from 
their intellectual “loading”’. 


44. Dou, E. A. (Sponsor and editor). The 
Oseretsky tests of motor proficiency; a 
translation from the Portuguese adapta- 
tion. Minneapolis: Educational Test Bu- 
reau, 1946. Pp. 47. 


E. J. Fosa’s translation of the Da Costa 
Portuguese adaptation which had appeared in 
four subsequent issues of The Training School 
Bulletin (see 43) has been published as a col- 
lected reprint. A few corrections of the pre- 
vious text and a note on some preliminary 
trial of the Scale in the Vineland Laboratory 
conclude this brochure. 
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THE VALIDITY OF SOME ABBREVIATED INDIVIDUAL 
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and volume that marked the pro- 
cessing of recruits in World War II, 
clinical psychologists in both the Army 
and Navy found it necessary to con- 
dense and abbreviate the longer, indi- 
vidual intelligence testing techniques of 
civilian practice. As a result, many 
short intelligence scales were used by 
military psychologists [3]. Owing to 
the limitations imposed by the lack of 
both time and available experimental 
subjects, most of these abbreviated tests 
were not validated upon sample popula- 
tions either numerous enough or suffi- 
ciently randomly selected to satisfy the 
dictates of desirable procedure. Their 
acceptance and use in the military serv- 
ices often rested as much upon neces- 
sity and faith as it did upon adequate 
experimental verification. 

The present study was designed to 
test the validity, upon a large experi- 
mental population, of five previously 
used abbreviated intelligence scales, and 
to devise some new ones for future use. 
In selecting our five scales for further 
validation, we kept in mind the extent 
of their previous usage in military and 
civilian practice, the size of the validity 
coefficients previously reported, and 
their general promise for clinical use. 

The tests selected were the following: 


ee DER the demands for both speed 


1 This study is part of a larger project sub- 
sidized by the Office of Naval Research under 
their policy of encouraging basic research. The 
opinions expressed, however, are those of the 
individual authors and do not represent the 
opinions or policy of the Naval service. 
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1. The CAS abbreviation of the Wech- 
sler-Bellevue scale, consisting of the sub- 
tests for comprehension, arithmetic, and 
similarities. Originally proposed by Ra- 
bin [6], this test has been widely used 
in both military and civilian practice. 

2. The CA abbreviation of the Wech- 
sler-Bellevue, consisting of the subtests 
for comprehension and arithmetic. Pro- 
posed by Cummings, MacPhee, and 
Wright [1], this test was used exten- 
sively at the Great Lakes Naval Train- 
ing Station during the war. 

3. The PA-DS abbreviation of the 
Wechsler-Bellevue, consisting of the 
subtests for picture arrangement and 
digit span. This has been proposed by 
Gurvitz [2] who makes a strong claim 
for its efficiency. 

4. The Kent, 10-item, Revised E.G.Y. 
[4]. This is an abbreviation by Grace 
Kent of her earlier Oral Emergency Test 
[5]. It was produced early in the war 
in answer to military requests, and was 
widely used in the Naval service. 

5. An abbreviated 15-item vocabulary 
test drawn from the vocabulary list on 
the Stanford-Binet, 1937 revision, Form 
L, by Thorndike, [7, Test 5], but which 
has not had adequate independent stand- 
ardization. 

All the tests were administered to 528 
Naval recruits atthe Great Lakes Naval 
Training Station.? These recruits may 
be considered a “normal” population as 

2 Our thanks are due the staff at the Great 


Lakes Training Station for their untiring and 
complete cooperation in this study. 
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all had survived the neuropsychiatric 
screening procedures at the Station. 
They were randomly selected and may 
be considered representative of the en- 
tire recruit population at that installa- 
tion at the time. The age range was from 
17 to 24 years with a mean of 17.45 and 
a standard deviation of .26. The mode 
was clearly at 17 with only 20 cases over 
18 years old. The school grade com- 
pleted ranged from grades 6 to 15 with 
a mean of 11.06 and a standard devia- 
tion of 1.49. The General Classification 
Test scores ranged from 29 to 76 with 
a mean of 56.30 and a standard devia- 
tion of 12.25. 

It is clear that this is a superior 
group and surpasses the average educa- 
tional attainment of the population of 
the country at large. In terms of intel- 
lectual ability, the distribution is skewed 
to the left with a piling up of cases at 
the higher end. Despite this skewing, 
the distribution seemed to cover a suf- 
ficient range of ability and to be suffici- 
ently normal to reflect validly any rela- 
tionships existing in the test data. 

To check the adequacy of our distri- 
bution, however, a normally distributed 
population of 214 cases, theoretically 
representative of the population of the 
country at large, was withdrawn from 
the larger sample of 528 cases. The 
smaller sample was obtained by dupli- 
cating the war-time distribution of Gen- 
eral Classification Test scores; and al- 
lowing for the absence of detectable 
mental deficiency, it represents reason- 
ably closely the range of intellectual 
ability in the entire population for this 
age range and sex. All correlations cal- 
culated for the larger sample were du- 
plicated on the smaller sample and no 
significant differences appeared. 

The main body of the testing was 
done by the three junior authors, all of 
whom have had extensive military and 
civilian clinical testing experience. To 


ascertain whether any significant indi- 
vidual differences were appearing in 
their testing performances, a group of 
72 subjects, equated for General Classi- 
fication Test score, was selected for each 
tester. Since there were no significant 
differences in the mean test scores of 
these three groups, we are assuming 
that test administration is a relatively 
constant factor in this study. 


The main criterion used in the study 
was the General Classification Test 
(GCT), Form III, which was adminis- 
tered to all the subjects within a few 
days of their testing with the individ- 
ual scales. The GCT, Form III, consists 
of subtests of sentence completion, op- 
posites, and analogies. It is representa- 
tive of, and correlates highly with, the 
standard paper-and-pencil tests of intel- 
ligence, and its importance in Naval 
classification procedures makes it a par- 
ticularly vital criterion for any Naval 
individual intelligence test. 

A second criterion was the abbrevi- 
ated Wechsler-Bellevue scale consisting 
of the five subtests constituting the vari- 
ous abbreviated scales mentioned above. 
A correlation of +.96 obtained on a 
group of 46 high-school students be- 
tween this abbreviated scale and the 
complete Wechsler - Bellevue* indicates 
that we are relatively safe in using this 
criterion as representative of the com- 
plete Wechsler-Bellevue scale. It must 
be kept in mind, however, that all cor- 
relations between this criterion and the 
three abbreviated Wechsler-Bellevue 
scales we are evaluating are spuriously 
high owing to the participation in the 
criterion of the tests being evaluated. 
The correlation between the five subtest 
Wechsler-Bellevue and the GCT was 
+.77 for the group of 528 Naval recruits. 


’ We are grateful to Mr. Paul Young of the 
Evanston High School and to Mrs. Agnes 


Plenk for their assistance on this phase of the 
study. 
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Table I gives the intercorrelations of 
both subtests and abbreviated scales, as 
well as their correlation with the two 
criteria. All correlations with GCT and 
with the Wechsler - Bellevue criterion 
were checked for linearity. Some of the 
correlations with GCT departed from 
linearity, but in these cases the value of 
eta proved to be of the same order as r. 

Of all five abbreviated batteries used, 
the Kent Revised E..G.Y. shows the least 
agreement with the criteria, correlating 
only +.50 with the Wechsler-Bellevue 
and +.58 with GCT. While these corre- 
lations are significant, it is evident that 
the abbreviated Kent test is not an effi- 
cient means of predicting performance 
in terms of either criteria. On the pos- 
sibility that the revised E.G.Y might 
function differently on different ranges 
of intelligence (i.e., be better with the 
lower levels of intelligence), we split 
our distribution on the basis of GCT 
scores and ran separate correlations for 
each segment. The test predicted no 
better with the bottom half of the dis- 
tribution than with the top half. 

The Thorndike vocabulary test yield- 
ed our highest correlation with the 
GCT, namely, +.80. In view of the ob- 
vious verbal weighting of the GCT, this 
is not surprising except in the light of 
the brevity of the Thorndike list (15 
words). Its correlation of +.66 with 
the Wechsler-Bellevue is fair. On the 
whole, vocabulary emerges as a good 
test in terms of our criteria. 

All the Wechsler-Bellevue abbrevia- 
tions show good agreement with both 
criteria except for a relatively poor cor- 
relation between the Gurvitz Picture Ar- 
rangement-Digit Span scale and GCT. 
The CAS scale correlates +.91 with 
Wechsler-Bellevue and +.78 with GCT; 
CA correlates+.87 with Wechsler-Belle- 
vue and +.70 with GCT; PA-DS corre- 
lates +.82 with Wechsler-Bellevue and 
+.52 with GCT. We must point out 


again that all correlations between the 
Wechsler-Bellevue subtests and Wech- 
sler-Bellevue criterion are spuriously 
high owing to the inclusion in the cri- 
terion of the tests being evaluated. 

In addition to agreement with a se- 
lected criterion, other factors such as 
ease of administration, time consumed, 
simplicity of scoring, etc. must be con- 
sidered in selecting a test for any spe- 
cific use. Susceptibility to cultural in- 
fluence is also an important criterion in 
evaluating a test, particularly where it 
is to be used with culturally handicapped 
groups. While our material is not suit- 
able for a tightly-controlled study of 
cultural influences, we have drawn some 
regional comparisons between educa- 
tionally favorable urban and rural areas 
and less favorable ones. The GCT, Vo- 
cabulary test, and Wechsler - Bellevue 
verbal subtests all show small but sta- 
tistically significant differences in favor 
of the educationally enlightened areas. 
The Kent, and Wechsler-Bellevue per- 
formance subtests do not show such dif- 
ferences. Apparently these last are less 
culture-bound, and therefore might be 
considered preferable for educationally 
handicapped subjects. 

If we view Table I as a reservoir of 
test materials, the possibility of con- 
structing new test combinations or bat- 
teries is immediately evident. The high 
correlation of vocabulary with GCT sug- 
gests vocabulary as a promising mem- 
ber of any abbreviated battery. We have 
tried several new combinations of com- 
prehension (C), similarities (S), arith- 
metic (A), picture arrangement (PA), 
digit span (DS), and vocabulary (V). 

Table II presents the multiple corre- 
lations of eight of these combinations 
(C-V-S, C-V, V-DS, V-A, V-PA, C-A- 
PA, S-DS-PA, and S-PA) with the 
Wechsler-Bellevue five subtests battery 
and G.C.T. All eight of these batteries 
show good agreement with both criteria. 
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The clinician may select that combina- 
tion that best fits the demands of a par- 
ticular testing situation. Thus V-PA 
agrees +.80 with both criteria and com- 
bines a verbal with a performance sub- 
test. 

Perhaps the most promising of these 
short scales is CVS. It agrees with +.87 
with the Wechsler - Bellevue criterion 
and +.86 with GCT. It has been de- 
signed for diagnostic potentiality since 
it offers a comparison between vocabu- 
lary score, which is relatively insensi- 
tive to psychopathosis, and scores for 
comprehension and similarities, both of 
which are sensitive to psychopathosis. 
Such “scatter,” based upon the discrep- 
ancy between vocabulary level and other 
intellectual functions, has proved valu- 
able for clinical diagnosis on the longer 
tests. 

Standard scores and norms for the 
Wechsler-Bellevue subtests in Table II 
are available in Wechsler’s manual [8]. 
In order that the batteries suggested in 
Table II may all be available for use, 
some comparable standard scores and 
norms are necessary for Vocabulary. 
Using Wechsler’s techniques, we have 
calculated standard scores for Vocabu- 
lary on a group of 724 Naval recruits 
(including the 528 which are the basis 
for this study) between the ages of 17 
and 24. The mean GCT score is 49.20 
with a standard deviation of 11.90. This 
compares favorably with a Naval war- 
time mean of 50 and standard deviation 
of 10, and suggests that our sampling 
of 724 cases is fairly representative of 
the male population at large in this age 
range. We therefore suggest that our 
standard scores can be used with Wech- 
sler’s norms if desirable. Such a prac- 
tice is certainly not statistically ideal, 
but is typical of accepted clinical prac- 
tice and certainly is no less desirable 
than most of the extrapolation, interpo- 
lation, and translation currently prac- 


ticed with normative materials. Our 
standard scores for Vocabulary are pre- 
sented in Table III. 


SUMMARY 


Validity coefficients for 13 abbrevi- 
ated individual intelligence tests are 
presented in terms of agreement with 
two criteria, a five subtest Wechsler- 
3ellevue battery, and the Navy General 
Classification Test, Form III. The test 
population consisted of 528 Naval re- 
cruits between the ages of 17 and 24. 
The results bear out the previous prom- 
ise of abbreviated scales as serviceable 
measures of intelligence. 


TABLE I 
INTERTEST CORRELATIONS 
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TABLE Il 


MULTIPLE CORRELATIONS* OF ABBREVIATED 
SCALES WITH WECHSLER-BELLEVUE 
(5 SUBTEST) AND GCT. 





Test-Battery W-B GCT 
C-V-S 87 36 
C-V 80 83 
V-DS 77 81 
V-A 85 83 
V-PA 80 80 
C-A-PA 94 .72 
S-DS-PA 94 .75 
S-PA 87 72 





* Multiple correlations calculated by the iteration meth- 
od of R. L. Thorndike, adapted from Kelly and Salis 
bury. 
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TABLE III 
STANDARD SCORES FOR VOCABULARY 
Raw Standard Raw Standard 
Score __ Score Score Score 
15 20 7 9 
14 19 6 8 
13 18 5 6 
12 16 4 5 
11 15 3 3 
10 13 2 2 
9 12 1 1 
8 11 
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A STUDY OF THE HUNT-MINNESOTA TEST FOR ORGANIC 
BRAIN DAMAGE AT THE UPPER LEVELS 
OF VOCABULARY’ 


By HARRIET JUCKEM anp JANE A. WOLD 


UNIVERSITY OF MINNESOTA 


ARIOUS psychologists who have 
used the Hunt-Minnesota test for 
the detection of intellectual deteriora- 
tion have formed the clinical impression 
that persons with superior vocabularies 
tend to obtain scores on the test indica- 
tive of deterioration. Hunt himself sug- 
gests that the T scores of subjects near 
the extremes of intelligence be inter- 
preted with extreme caution. Although 
the predicted (P) scores were based on 
the assumption of a rectilinear relation- 
ship which was founded on failure to 
establish significant curvilinearity— 
failure to refute the “null hypothesis”, 
it was felt that the relationship might 
be more truly curvilinear when a suffici- 
ent sampling of the upper levels of vo- 
cabulary was made. The present study 
was done to determine the adequacy of 
the Hunt norms when testing superior 
subjects. 

The Hunt test [2, 3] like the Babcock 
[1] and the Shipley [4], basically rests 
on the theory that in deterioration all 
intellectual functions are not affected to 
the same degree. Hunt writes that Bab- 
cock’s assumption that the vocabulary 
level is an indicator of the deteriorated 
subject’s prodromal intellectual level is 
probably only “partly justified in view 


1 Based on papers submitted to the gradu- 
ate very = of the University of Minnesota in 
partial fulfillment of the requirements for the 
M.S. and M.A. degrees. Grateful acknowledg- 
ment is given Professor P. E. Meehl for his in- 
= assistance in the preparation of this 
article. 


of Wesley’s [5] finding that there were 
some evidence that vocabulary scores 
decrease in deterioration.” Hunt as- 
sumes only that persons with organic 
brain damage can be differentiated from 
normals in terms of the discrepancy be- 
tween vocabulary level and performance 
on design and word pair tests of the 
“deteriorating functions”. The score 
that a normal person would be expected 
to attain on the timed deterioration 
tests was predicted from his perform- 
ance on the vocabulary test, taking into 
account the age factor. The discrepancy 
between the prediction and the score ob- 
tained is used as an index of deteriora- 
tion. This index is expressed in T scores, 
where the SD is the standard error of 
estimate about the regression plane. 

In the present study, two examiners 
administered the Hunt - Minnesota test 
to a group of subjects selected, for the 
most part, from elementary and ad- 
vanced psychology classes at the Uni- 
versity of Minnesota, and from the neu- 
ropsychiatric staff of the University of 
Minnesota hospitals. Examiner 1 tested 
30 of the subjects, while Examiner 2 
tested 20. Only those subjects were ac- 
cepted who were not familiar with the 
deterioration test material or the Stan- 
ford-Binet vocabulary. Furthermore, 
any individuals, including combat vet- 
erans, for whom there was any known 
possibility of actual brain pathology 
were eliminated from the study. All 
subjects were volunteers. The first 50 
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subjects who successfully defined 30 or 
more words on the Stanford-Binet list 
were selected for this study. A critical 
score of 30 words was set because if 
there is a tendency for normal individu- 
als to obtain spuriously high T scores 
because of high vocabulary, it would be 
especially noticeable at this level. De- 
fining 30 words correctly is required on 
the Stanford-Binet test at its highest 
level—Superior Adult ITI. 

Of our 50 subjects, 27 were females 
and 23 were males. No attempt was 
made to see that there was an equal 
number of men and women, as Hunt did 
not believe sex to be an important vari- 
able in test performance. This assump- 
tion was supported by the statistically 
insignificant differences in mean T 
scores which were found between males 
and females. 

Our subjects ranged in age from 18 
to 46 with the mean age of 25.16. The 
mean age of Hunt’s control group was 
considerably higher, being 44.6. How- 
ever, if the effect of age on the deterio- 
ration test score is adequately allowed 
for in computing the T score on a lin- 
ear basis, the age difference between 
the two groups is inconsequential. 

The Hunt test may be given either in 
a long or in a shortened form. Hunt 
reports a correlation of plus .99 between 
the two forms. In view of the magnitude 
of this correlation, we did not feel that 
it was necessary to give the long form 
to our subjects. All scores reported in 
this paper, therefore, are for the short 
form of the test. 


STATISTICAL ANALYSIS OF DATA 


In the statistical analysis of our data, 
when comparisons are made with Hunt’s 
normal group, it should be remembered 
that our group of subjects was not com- 
parable to Hunt’s group, in that ours 
was highly restricted as to vocabulary 
(30 words or above) and as to age. Nev- 


ertheless, these differences for our pur- 
poses should not be significant because 
in computing the predicted score of an 
individual, vocabulary and age are sup- 
posedly taken into account. 

The mean age of our group was 25.16 
with a S.D. of 6.10. This group, then, 
is considerably younger than Hunt’s 
normal group, whose mean age was 44.6. 
Our subjects mean vocabulary fell at 
35.56 with a S.D. of 3.49. Being both 
superior in vocabulary and younger in 
age, the subjects would have to do ex- 
tremely well on the deterioration tests 
in order to obtain a normal T score. 
However, it is obvious that they did not 
come up to their predicted performance, 
as the mean T score was much elevated, 
being 69.64. A score of this magnitude, 
according to Hunt, would be considered 
suggestive of pathology; that is, only 
3% of the non-brain-damaged popula- 


Our 50 Hunt’s 41 
Normal Control 
Subjects Subjects 
T-Scores 
26-30 x 
31-35 xxx 
36-40 XXXxX 
x 41-45 xxx 
x 46-50 xxxx 
xxx 51-55 XXXXXXXXXXXX 
xxXxXxx 56-60 xXxXxXxx 
XXXXXXXX 61-65 xXxxx 
xx 66 
xx 67 x 
xx 68 xx 
x 69 
xx 70 
xx 71 
x 72 x 
x 73 
Xxx 74 
15 
x 76 
XXXXXX 77-81 
xxxx 82-86 
XXxXxx 87-91 


Fic. 1. Distribution of T Scores for Hunt’s 
41 control subjects and for our 50 normal sub- 
jects. 
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tion would be expected to obtain a score 
this high. Furthermore, 60 per cent of 
the subjects had T scores above Hunt’s 
critical score of 66 (see Fig. 1.). 

The confidence belt for the mean at 
the 5 per cent level (using Fisher’s t) 
is from 63.29 to 75.99. At the 1 per cent 
level it runs from 61.29 to 77.99. This 
means that the “true mean” is almost 
certainly 61, or more than a full sigma 
above the norm mean of 50. 

The mean A score (deterioration 
test) for the whole group was 73.58 
with a S.D. of 15.97. The A score is an 
attained score; hence, high values are 
“good”. The experimental group’s mean 
score can in no way be compared with 
Hunt’s data for his control group, as his 
data are only given for the long form. 
Furthermore, our subjects would be 
expected to have a much higher score on 
the deterioration tests on the basis of 
higher vocabulary. Consequently, in or- 
der to get a graphic picture of the devi- 
ation of the obtained A scores from 
those to be expected according to Hunt, 
a scatter plot was made (see Fig. 2.) The 
theoretical line was determined by 
Hunt’s data, using the variables of A 
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Fic. 2. Scatter plot representing the devia- 
tion of the attained (A) scores from the pre- 
dicted (P) scores. 
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score and vocabulary. In the absence of 
data directly applicable on the short 
form, the theoretical line was determin- 
ed by finding the A score which would 
have to be obtained by a person 25 years 
old (the mean of our group) at two vo- 
cabulary levels in order to have a T 
score of 50. Our cases were then plotted 
on the same coordinate system. Inspec- 
tion of Fig. 2 reveals that the expected 
score was attained or exceeded in only 
two cases. Even more important is the 
fact that the great majority of scores 
are far below the theoretical line. The 
mean attained A score now is discovered 
to be 29 points lower than the P predic- 
ted from Hunt’s tables. 

Little inspection is needed to see that 
there is not even any tendency for the 
obtained scores to group at any one lev- 
el. There are just as many scores at the 
extreme lower part of the plot as there 
are closer to the theoretical line. It may 
be, however, that in spite of our precau- 
tions some of the lowest scores were 
those of persons actually having some 
minimal deterioration. In a group of 50 
persons it is likely that there will be 
found an occasional case of unrecog- 
nized brain damage. But of course this 
could not account for the general trend 
displayed here. It is also probable that 
these persons were affected by other 
factors which influenced their scores. 
The-person who feels exceedingly inade- 
quate when confronted by the design 
pairs because he “knows he does poorly 
on tests involving spatial relations,” 
may be influenced by his feelings to the 
extent that he becomes “rattled”. Such 
cases were not eliminated from the 
study because this problem is incurred 
in routine clinical testing. Further- 
more, even when these cases are elimi- 
nated, the discrepancy and scatter of 
scores is still very evident. 

Although the long form of the test 
was not given, and consequently the in- 
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terpolated tests which indicate lack of 
cooperation were omitted, it is not the 
opinion of the experimenters that lack 
of cooperation was a factor in produc- 
ing the obtained results. In any case, it 
can hardly be assumed that these nor- 
mal, volunteer subjects would as a group 
have more test anxiety or poorer coop- 
eration and motivation than the typical 
clinical patient on whom the test is to 
be used. 

Results of our analysis strongly sug- 
gest that the predicted deterioration 
score for persons with superior vocabu- 
laries is far too high resulting in a dis- 
crepancy between the expected and ob- 
tained scores and thus in a spuriously 
elevated T score. This indicates that the 
basic assumption of rectilinearity of re- 
gression of A on vocabulary is false. 
Our results suggest that if the A and 
vocabulary scores for a large heteroge- 
neous group were plotted, we might ex- 
pect a marked tendency for the graph 
to flatten out as we approach the upper 
end of the scale. This is in agreement 
with the curve form published by Bab- 
cock, using somewhat different deterio- 
rating functions. 

The only possible way to achieve a 
normal mean T score for our group of 
subjects was to reassign a new vocabu- 
lary weighted score. A lower vocabu- 
lary score would bring down the pre- 
dicted score and thus lower the T score. 
Calculation showed that for this group 
of normal persons a vocabulary level of 
21 words needs to be assigned in order 
to yield a mean T score of 50. 

As Hunt’s predicted deterioration test 
score is based upon both vocabulary per- 
formance and age with not only a high 
vocabulary score predicting a higher A 
score but also greater age predicting in- 
versely a lower score, the validity of the 
latter hypothesis for our data ought also 
to be tested. If a significant relation- 
ship between age and A score does not 


hold, weighting for the age factor would 
contribute to spuriously raising the T 
score. The correlation for our data was 
—.29 which, considering our restricted 
range, is not inconsistent with the cor- 
relation of —.37 yielded by Hunt’s data. 
The analysis was carried a step further 
by breaking down the data and getting 
correlations between age and designs 
and age and words separately. These 
were .00 and —.40 respectively. This re- 
sult suggests that for our group the cor- 
rection for age would be necessary only 
for the design pairs. Hunt does not in- 
clude separate correlations for the two 
parts of the deterioration test so we can- 
not compare our results with his. The 
fact that there is a significant negative 
relationship for only the age and design 
variables might possibly be accounted 
for in part by the more rigorous timing 
in this part of the test. 

The correlation between age and vo- 
cabulary was computed and a significant 
positive relationship of +.30 was ob- 
tained. Hunt found a +.07 relationship 
between the same two variables which 
is more consistent with the findings of 
previous studies. Our significant posi- 
tive relationship can probably be ex- 
plained by the composition of our sam- 
ple. The older subjects were largely 
graduate students in psychology and 
medical fellows who had survived a long 
process of rigorous educational compe- 
tition. The younger subjects were near- 
ly all sophomore psychology laboratory 
students who had not been exposed to 
as many years of education and natur- 
ally were not nearly as select as a group 
so far as intelligence is concerned. We 
may ask the question of how do the 
above conditions affect our experimen- 
tal findings? As superior vocabulary 
scores result in a higher predicted A 
score, the fact that for our sample older 
subjects tended to get higher vocabulary 
scores would contribute to raising the 
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T scores somewhat. 


INTERPRETATION AND CONCLUSIONS 


On the basis of the findings of this 
study, it would appear that the Hunt- 
Minnesota test yields far too many 
“false positives” among persons of high 
vocabulary. Sixty per cent? of our cases 
obtained T scores above the critical 
score of 66. This finding is in agreement 
with the finding of Malamud [6]. Fur- 
thermore, above the vocabulary score of 
20, there does not seem to be a rectilin- 
ear relationship between the variables 
of vocabulary score and score in dete- 
rioration tests. As 20 words is the av- 
erage score made by adults on the Stan- 
ford Binet vocabulary list, the above 
findings would suggest that there is 
little or no relationship between vocabu- 
lary score and performance on the de- 
terioration tests for individuals of above 
average ability. It would be of great 
value if future research would include 
the testing of normal persons whose vo- 
cabulary was more nearly within the av- 
erage range, for then it would be pos- 
sible to see if the rectilinear relation- 
ship holds at other levels or if it tends 
towards curvilinearity. Incidentally, 
Hunt was able to separate his brain- 
damaged from his control cases almost 
as well using raw deterioration score 
alone as when using difference scores. 

To the best of our knowledge, the dif- 
ferences between Hunt’s normal group’s 
performance and ours cannot be attrib- 
uted to either faulty testing or to a lack 
of cooperation and motivation on the 
part of the subjects. Possibly on a third 
repetition of the design and word pairs, 
the persons with very superior vocabu- 
laries might do markedly better than 
those persons nearer the center of the 
distribution. Thus the long form of the 

2 Hunt’s study included only four subjects 
whose vocabularies were greater than 29 words. 


These four cases had scores well within nor- 
mal limits. 


Hunt test might yield a more normal 
distribution of T scores. In this connec- 
tion a re-examination of the reported 
correlation of +.99 between the long 
and short forms of the test might be 
very profitable. 

It is concluded, therefore, that in us- 
ing the Hunt-Minnesota test on individ- 
uals with Stanford-Binet vocabularies 
of 30 or over, only negative significance 
may be attached to the test results. A T 
score above the critical score of 66 may 
not be regarded per se as suggestive of 
deterioration. 


SUMMARY 

1. Only negative significance may be 
attached to the results of the Hunt test 
when administered to persons of super- 
ior vocabularies. 

2. The relationships between A score 
and vocabulary score is not rectilinear 
for individuals who have Stanford-Binet 
vocabularies of 30 words or more. 

3. If a normal T score is to be ob- 
tained, the vocabulary score must be 
lowered to 21 words. In other words, 
there seems to be little or no relation- 
ship between ability and performance 
on the deterioration tests for individuals 
of above average ability. 
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A NOTE ON THE INTELLIGENCE OF DELINQUENTS 
AT INDIANA BOYS SCHOOL’ 


By LORENA P. WEDEKING 


INDIANA BOYS SCHOOL 
PLAINFIELD, IND. 


ONTRARY to earlier studies [1, 
2], we have found that the greater 
number of male juvenile delinquents are 
intelligent. Tests given during a period 
of thirteen months revealed that cnly 
2.4 per cent of 500 boys could be diag- 
nosed as feebleminded, while 91.6 per 
cent were considered normal. Among 
the 91.6 per cent, only 26.6 per cent 
cent were inferior or below average in 
intelligence, 47.8 per cent were average, 
and 17.2 per cent were of above-average 
intelligence. Table I indicates the dis- 
tribution of I1.Q.’s. 

The subjects were boys who entered 
Indiana Boys School from August 1, 
1945 to September 1, 1946 either as 
newly committed by the courts, or re- 
turned for replacement, recidivism, or 
both. The age range was from ten to 


from all parts of the state; 97 boys, 19.4 
per cent, were from the city of In- 
dianapolis. White boys constituted 80 
per cent, colored 20 per cent. Only five 
boys who came to the institution during 
the thirteen-month period could not be 
included in the survey because they 
were not present long enough to be 
tested. 

The Wechsler-Bellevue Scale I and 
the 1937 Terman-Merrill revision of the 
Stanford-Binet Scales were used. Use of 
the Wechsler was begun in April 1946. 
When both scales were given to one boy, 
only the Stanford-Binet 1.Q. was used 
in the tabulation. 
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TABLE I 


DISTRIBUTION OF 1.Q.’s oF 500 DELINQUENTS 

















1.Q. 
Range 
IID, ceitieciscodieteitieinpithennsiitn 50-65 
EE ne ae, aD 67-72 
66-74 
DRI iictihevtnbinsinisintitthastsunn 84-89 
75-90 
DUI cccrccenctnsmerrnnvsiccesinetmnecivans 91-110 
Above Average ...................-..-.- 111-129 
111-170 
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Books 


Biyou, Smney W. (Ed.) The psychological 
program in AAF convalescent hospitals. 
Army Air Forces Aviation Psychology Pro- 
gram, Report No. 15. Washington: U. S. 
Government Printing Office, 1947. Pp. viii 
+ 256. 


For about one year, October 1944 to October 
1945, an intensive program of clinical psycho- 
logical service and research was carried out in 
the AAF Convalescent Hospitals. About 235 
psychologists participated, in 11 hospitals. 
Chapter 2 of the report describes the service 
program, including orientation, diagnostic eval- 
uation, personal counseling, vocational and 
educational counseling and group therapy. The 
research programs of the hospitals were de- 
signed to invent and develop tools needed for 
the practical services, and to evaluate the re- 
sults of convalescent treatment. The research 
chapters of the report review accomplishments 
in personality and adjustment studies; inter- 
est, attitude, and biographical surveys; the 
measurement of disturbance of mental func- 
tioning; and explorations in projective tech- 
niques. Even in its short career, the AAF con- 
valescent program made a number of signifi- 
cant administrative and research contributions 
to the advancement of clinical psychology in 
hospitals. 


CARROLL, HERBERT A. Mental hygiene. New 
York: Prentice-Hall, 1947. Pp. v + 329. 


This textbook on disorders of personal and 
social behavior is intended chiefly for students 
who will not become professional psychologists. 
In the main, it is eclectic and descriptive,with 
little nicety of theory, but numerous interest- 
ing illustrations drawn mainly from the lives 
of college students. Although the core of the 
book (4 chapters) is devoted to motivation, 
there is insufficient attention to genetic de- 
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velopment or cultural influences. The unqual- 
ified statement, “Every individual has the de- 
sire to achieve,” (p. 22) is typical. Other chap- 
ters describe the learning of behavior disor- 
ders, psychoneuroses, psychoses, mental supe- 
riority and deficiency, measurement, educational 
mental hygiene, and psychotherapy. The liter- 
ary style reads smoothly. There are 11 pages 
of questions and exercises, and a bibliography 
of 353 titles. 


CRAWFORD, MEREDITH P., SOLLENBERGER, RICH- 
ARD T., WARD, LEWIs B., BROWN, CLARENCE 
W., AND GHISELLI, EDWIN E. (Eds.) Psy- 
chological research on operational training 
in the Continental Air Forces. Army Air 
Forces Aviation Psychology Program, Re- 
port No. 16. Washington: U. S. Government 
Printing Office, 1947. Pp. vii + 367. 


The last stage of preparation for Air Force 
personnel is “operational” training, in which 
individuals learn to use combat equipment and 
tactics in an organizational setting, after hav- 
ing completed basic training in a particuular 
skill. In World War II, operational training 
was a function of the First, Second, Third and 
Fourth Air Forces, based in the United States. 
Psychologists were assigned to these Air Fore- 
es in 1944, and remained for the rest of the 
war. Their first mission was to gather data 
on the efficiency of personnel that could be used 
as criteria to evaluate the tests for the selec- 
tion of air crew. Gradually, the capability of 
psychologists to contribute to other problems 
was recognized, and they participated in fur- 
ther studies of classification and training. The 
report gives an analysis of the duties, criteria 
of proficiency, and validation studies, of each 
air crew specialty. These provide the AAF’s 
best psychological picture of the demands made 
upon men by the tasks of combat flying. There 
are chapters on selection and evaluation of 
lead crews, learning studies, and studies of 
attitudes. 
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DuBois, Putuip H. (Ed.) The classification 
program. Army Air Forces Aviation Psy- 
chology Program, Report No. 2. Washing- 
ton: U. S. Government Printing Office, 1947. 
Pp. xiv + 394. 


The major task of the AAF psychologists, 
from 1941 to 1945, was the selection and clas- 
sification of personnel for air crew duties. This 
basic report describes the history and organi- 
zation of the program, the batteries of tests 
used, and the results of four years of research 
on the validity of the procedures. Of special 
interest is the complete account of “the experi- 
mental group” of 1300 men placed in pilot 
training in 1943 without psychological selec- 
tion, to test the validity of the procedures, 
free from the influence of the curtailment of 
range of talent. There are also sections on spe- 
cial activities, including the setting up of clas- 
sification methods for the Free French and 
Philippine Air Forces. The volume is richly il- 
lustrated by charts and tables. Appendices give 
tables of testing statistics and a list of the 
more than one thousand officers and enlisted 
men who worked in the classification program. 


GESELL, ARNOLD AND AMATRUDA, CATHERINE 
S. Developmental diagnosis. (2nd Ed.) New 
York: Paul B. Hoeber, 1947. Pp. xvi + 496. 


Like the first edition (1941) this is a prac- 
tical book for those concerned with the clinical 
diagnosis of infant development. The longest 
chapter is a clear description of developmental 
norms at 4, 16, 28, and 40 weeks, and 12, 18, 
24, and 36 months, illustrated by admirable 
line drawings from motion picture records. De- 
velopment is traced in the motor, adaptive 
(sensori-motor), language, and personal-social 
behavior areas. Differential diagnosis of re- 
tardation and amentia, of endocrine, convul- 
sive, neurological and other defects and devia- 
tions are described precisely, with experimen- 
tal evidence and illustrative studies. There is 
a manual for the materials, procedures and 
interpretation of the Gessell developmental ex- 
amination. Amid all this excellent detail, one 
has a feeling that infant growth is being 
watched as if it were the development of a 
complex and entrancing little machine. “Love” 
and “warmth” are not even in the index of the 
book. Emotional behavior is indexed only as 
a sign suggestive of deafness in infants and 
young children! Without plunging into the op- 
posite extreme of either sentimental or psy- 
choanalytic excesses, Gessell’s work would 
profit from a greater regard for the feeling- 


tones of infancy and for the emotional aspects 
of the social interaction between infant and 
adult. A blend of the hypotheses of Ribble or 
Spitz, with the precise observation and de- 
scription of Gessell, might make a real con- 
tribution to our understanding not only of the 
infant but of man throughout his life. 


MURPHY, GARDNER. Personality, a biosocial ap- 
proach to origins and structure. New York: 
Harper, 1947. Pp. xii + 999. 


Murphy’s Personality is a very great book, 
that brings the spark of synthesis to a mass 
of significant but hitherto ill-related knowl- 
edge. It will remain an essential to the well- 
educated psychologist for some time to come. 
The integrated approach starts with the bio- 
logical organism, and proceeds in orderly turn 
to learning as the organism’s first interaction 
with specific environment, to perception and 
thought arising from learning, to the percep- 
tion of self, conscious and unconscious, in con- 
flict and in unity, thence to the organization 
of self, and last to the broader wholeness of 
self and culture. In the main, this sweeping 
plan gives unity to the entire volume; it is 
rarely lost sight of. A few chapters, such as 
those on Adler and Jung, and on projective 
techniques, are bound to the organization by 
slender threads, but what author can resist the 
temptation to include some of his favorite lec- 
tures in his book when even reasonably appro- 
priate? Other virtues add their unneeded 
weight to the scope and scholarliness of the 
book. The literary style is superb—you will 
find yourself reading passages aloud with glee. 
The typography and paper are feasts for the 
eye and hand. References are segregated in 
the back of the book, with citations of page, 
paragraph and line, to replace footnotes or 
reference numbers. The bibliography has 749 
entries, and the index is also a glossary. 


MURSELL, JAMES L. Psychological testing. New 
York: Longmans, Green, 1947. Pp. xiv + 
449. 


Mursell has produced an informative, well- 
organized textbook for the introductory course 
in psychological measurement. It is almost 
wholly descriptive. Theory is given scant at- 
tention; statistical methods are left to other 
treatises. On the whole the choice of illustra- 
tive tests is sound, and the information about 
them is up to date and well documented. There 
seem to be a few slips—a highly commercial- 
ized questionnaire’s restricted distribution 
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“adds to one’s confidence in it,’—the Kuder is 
unaccountably missing among several less im- 
portant interest measures. Projective tech- 
niques, perhaps for the best at this level, are 
given only brief mention in a final chapter on 
“evolution and improvement,” along with fac- 
tor analysis and profile scoring. There are 


separate bibliographies of references and of 
tests. 


Pace, JAMEs D. Abnormal psychology. New 
York: McGraw-Hill, 1947. Pp. xvii + 441. 


Designed for an introductory course, this is 
a capably written, middle-of-the-road textbook. 
The descriptive and illustrative material re- 
veals the author’s first-hand experience, and 
will leave the student with clear impressions 
of the phenomena of mental disorder. The 
theoretical approaches are handled less ade- 
quately. Except for an early chapter on moti- 
vation and adjustment, theories are fragment- 
ed throughout the sections on the types of dis- 
orders. Psychoanalysis is treated in a sepa- 
rate chapter and somewhat unfavorably. Con- 
stitutional and physiological etiologies are han- 
dled fairly adequately, psychological ones less 
so, while cultural factors are almost ignored. 
No mention is made of the rich new literature 
on experimental studies of conflict. On the 
whole, however, the book does not compare un- 
favorably with most elementary texts in this 
area. 


PEATMAN, JOHN G. Descriptive and sampling 
statistics. New York: Harper, 1947. Pp. 
xviii + 577. 


As a text for a first whole-year course in 
statistics, Peatman’s book treats both the sta- 
tistics of description, and of the confidence lim- 
its of inferences from statistics obtained from 
samples. A feature, found in relatively few 
familiar texts since Yule, is a fairly full treat- 
ment of the statistics of attributes or cate- 
gories, with applications to problems of recent 
interest in opinion and market research. Al- 
though applied and not basically mathemati- 
cal, the treatment is more symbolic and syste- 
matic than most elementary texts. Student 
with little mathematical preparation will be 
frightened. For the able, the book gives an 
adequate introduction to all but the most com- 
plex statistics used in psychology. Attention 
is paid to logic and design as well as to cal- 
culations. An appendix of 34 pages of statis- 
tical tables provides for the needs of students. 


Runes, DAcosert D. (Ed.) The selected writ- 
ings of Benjamin Rush. New York: Philo- 
sophical Library, 1947. Pp. xii + 433. 


Benjamin Rush (1745-1813) , physician, 
founding father of both his country and his 
profession, was especially a pioneer in psycho- 
pathology. This volume brings together some 
of his writing on government, education, and 
the sciences. Psychologists will have particu- 
lar interest in his essays, “the influence of 
physical causes on the moral faculty,” and “on 
the different species of mania.” They reflect 
how much we have progressed in one hundred 
fifty years—and how much we have not. 


TIFFIN, JOSEPH. Industrial psychology. (2nd 
Ed). New York: Prentice-Hall, 1947. Pp. 
xxi + 553. 


The revised edition contains two new chap- 
ters, on the interview, and on wages and job 
evaluation. The remaining chapters have been 
revised, with some addition of materia] not 
covered by the original edition (1942). Sur- 
prisingly, there seems to be little new data 
drawn from the work of the armed forces dur- 
ing the war. There is also no mention of em- 
ployee counseling, but perhaps this topic is 
clinical psychology rather than industrial psy- 
chology. In the main, ‘however, Tiffin remains 
one of the most clear and meaty books on in- 
dustrial testing, training and evaluating. There 
is an appendix on elementary statistical pro- 
cedures, and another giving the Taylor-Russell 
Tables on efficiency of selection by tests. 


TOMKINS, SILVAN S. The Thematic Appercep- 
tion Test. New York: Grune & Stratton, 
1947. Pp. ix + 297. 


Tomkin’s book is a substantial contribution 
to the literature of the TAT, perhaps the first 
sizable book intended primarily as a manual 
for this test. Chapters review the history and 
development of the TAT, administration, scor- 
ing (which departs somewhat from usual meth- 
ods), and level analysis. The major part of 
the volume is then devoted to diagnosis in the 
regions of family, of love, sex and marriage, 
of social relationships, and of vocation. The 
final chapter explores the use of the TAT in 
therapy. Throughout, the book is exception- 
ally rich in clinical illustrations that will be 
of value both to the student and to the experi- 
enced worker. The bibliography has 110 en- 
tries, a tribute to the vitality of a technique 
scarcely twelve years old. 
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TYLER, LEONA E. The psychology of human 
differences. New York: Appleton-Century, 
1947. Pp. xiii + 420. 


The most recent member of the Century Psy- 
chological Series is an elementary but by no 
means a superficial, textbook on differential 
psychology. There is unusually careful atten- 
tion to the methods and logic of investigation, 
and to problems of sampling. Citations of re- 
search are not only numerous, but are clearly 
and critically handled. Part I deals with basic 
principles; Part II with the major group dif- 
ferences: sex, race, class, age, the feeblemind- 
ed and the gifted. Part III surveys the rela- 
tionships of mental and physical traits, the ef- 
fect of practice, and the heredity-environment 
problem. Part IV is on some problems of meas- 
urement technique and statistics. 


TESTS 


Aptitude Test for Elementary School Teachers- 
In-Training, by H. Bowers. Normal School 
Applicants (Canada). 1 form. Parts I-V, 
(40) min.; Part VI, Rating of performance; 
Part VIII, High School percentile. Test 
forms, manual. J. M. Dent & Sons (Canada) 
Ltd., Stratford, Ont., 1946. 


This is an unorthodox and interesting ap- 
proach to the preselection of students for 
teacher training. Parts I-V consist of 76 atti- 
tude-interest questionnaire items selected from 
hundreds, against the empirical criterion of 
success in practice teaching. Part VI is a sys- 
tematic rating of the pooled evaluation of four 
judges who listen to a short oral presentation 
made by the applicant before a class. Part 
VII is a computation of high school percentile, 
applicable in Ontario only. The 47-page mime- 
ographed manual gives directions and an ac- 
count of the considerable amount of research 
underlying the test. 


Differential Aptitude Tests, by G. K. Bennett, 
H. G. Seashore, and A. G. Wesman. 7 tests: 
verbal reasoning, numerical ability, abstract 
reasoning, space relations, mechanical rea- 
oning, clerical speed and accuracy, language 
usage (spelling and sentences). Grades 8- 
12. 2 forms of each test (except mechanical 
reasoning). IBM. 25-35 (30-40) min. each 
test, except 6 (10) min. for clerical speed 
and accuracy. Test blanks, IBM answer 
sheets, keys, looseleaf manual. Psychologi- 
cal Corporation, New York, N. Y., 1947. 


The publication of Differential Aptitude Tests 
is a major psychometric event. The battery 
stresses the sigiificance of abilities rather than 
“ability” as the basis for prediction and guid- 
ance at the secondary school level. The parts, 
other than the clerical, are power tests rather 
than speed tests. Average reliabilities (except 
for that of girls on mechanical reasoning, 
which is .71) range from .85 to .93. Separate 
percentile norms are given for boys and girls 
from grades 8 to 12, based on national selec- 
tions of from 750 to 2000 cases for each grade- 
sex group for Form A, 350 to 1100 for Form 
B. Profiles of percentiles and standard scores 
are drawn, and illustrative case studies offer 
some assistance to counselors in the use of re- 
sults. The loose-leaf manual is convenient for 
reference, and for the addition of further data 
as they become available. Although there are 
many immediate applications for tests of this 
type, much research is needed on the validity 
of profiles for predicting various sorts of edu- 
cational and vocational success. 


Metropolitan Achievement Tests, Forms R, S, 
T and U. By G. H. Hildreth, R. D. Allen, 
H. H. Bixler, W. L. Connor, and F. B. Gra- 
ham. 5 levels: Primary I, Gr. 1 45 (60) 
min.; Primary II, Gr. 2, 85 (100) min.; Ele- 
mentary, Gr. 3-4, 185 (150) min.; Interme- 
diate, Gr. 5-6, 215 (240) min.; Advanced, 
Gr. 7-9, 230 (255) min. 4 forms at each lev- 
el. Test blanks, manuals, keys, class rocord 
forms. World Book Co., Yonkers, N. Y. 1946. 


This extensive set of elementary school 
achievement tests is available either as com- 
plete batteries or, at the grade 5 to 9 levels, as 
separate subject tests. The selection of sub- 
ject matter was based on curriculum studies 
and on extensive item analyses. Subtest reli- 
abilities range from .80 to .97, and are mainly 
above .90. National norms permit expressing 
result in terms of standard scores, age equiva- 
lents and grade equivalents. This excellent re- 
vision of a long-used achievement battery will 
find many educational applications and some 
clinical ones. 


Number Fact Check Sheet, by R. Cochrane, 
Grades 5-8. 2 forms. IBM. (24) min. Test 
blank, manual, key. California Test Bureau, 
Los Angeles, Calif., 1947. 


A diagnostic-survey test, containing the 100 
addition, 100 subtraction, 100 multiplication 
and 90 division facts of arithmetic. Responses 
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are made by marking out wrong answers, and 
may be directly IBM machine scored. Kuder- 
Richardson reliability is .93. Percentile norms 
are given for grades 5 to 8. 


Thurstone Interest Schedule, by L. L. Thur- 
stone. High school-adult. 1 form. Untimed, 
(10) min. Blank, manual. Psychological 
Corporation, New York, N. Y., 1947. 


The new interest schedule replaces the 
Thurstone Vocational Interest Schedule (1937). 
The blank consists of 100 squares, each con- 


taining 2 occupation names of which the ex- 
aminee may indicate his preference, or may 
accept or reject both. Ten interest groups are 
represented: physical science, biological sci- 
ence, computational, business, executive, per- 
suasive, linguistic, humanitarian, artistic, and 
musical. Scores from 0 to 20 are obtained for 
each group, and are plotted as a profile. Scale 
reliabilities (odd-even, corrected) range from 
.90 to .96. Scale intercorrelations are given, 
and some correlations with the Kuder inven- 
tory. A wide use ia vocational guidance may 
be predicted. 
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